1

I have a batch script which does a curl command using:

powershell -Command "curl.exe -k -S -X GET -H 'Version: 16.0' -H 'Accept: application/json' 'https://IntendedURL' > result.json".

I am then using the result.json as an input file to another program.

The issue I am facing is that running the batch script results in a result.json file with format UTF-16 LE BOM. With UTF-16 LE BOM format, the .json file cannot even be opened and observed directly using a normal browser. It has exception of "SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data".

I did some searching around and the answer that came back is that curl should not result in BOM encoding being saved. Nonetheless, I ended up with result.json which is with BOM encoding. If there is something that can be done to the batch script to make it output without BOM, help is appreciated.

For the second part of my issue, I can manually copy the content of result.json and save it in a new text document and specifically choose UTF-8 format. I searched around as well and found a bit of a solution like here: Batch script remove BOM () from file , but the solution script to remove the BOM seems rather complicated.

Thus, any help to remove the BOM encoding without resorting to manually copying the content and saving it in the specific desired format is appreciated as well.

Best regards, cyborg1234

4
  • 1
    Read following : stackoverflow.com/questions/56724398/… Commented Jul 28, 2023 at 12:38
  • @jdweng, the linked post is about Python's character-encoding choices, which do not apply here. Here, it is PowerShell's character-encoding behavior that is the problem. Commented Jul 28, 2023 at 18:05
  • @mklement0 : The issue is Python and the solution in the link to save the stdeout to an utf-8 file seems reasonable. Commented Jul 28, 2023 at 18:58
  • @jdweng, if you tried to use PowerShell to save to a UTF-8 file (which is what this question is about), you'd end up with (a) a character-encoding mismatch and misinterpreted non-ASCII characters and (b) in Windows PowerShell with a UTF-8 file with a BOM, both of which must be avoided here. Nothing in the linked post is remotely of any help with these issues. Commented Jul 28, 2023 at 19:03

1 Answer 1

1

Note:

  • See the bottom section for a way to avoid the problem by calling curl.exe directly from your batch file.

In PowerShell, > is (in effect) an alias of the Out-File cmdlet, whose default in Windows PowerShell is UTF-16LE (in PowerShell (Core) 7+, it is now the more sensible (BOM-less) UTF-8, across all cmdlets).

You have two options (spoiler alert: choose the 2nd):

  • In principle, you can use Out-File explicitly (or preferably, with text input, Set-Content), in which case you can use its -Encoding parameter to control the output character encoding. However:

    • In Windows PowerShell -Encoding UTF8 invariably creates a UTF-8 file with a BOM. Short of using .NET APIs directly, the only way to create BOM-less UTF-8 files in Windows PowerShell is to use New-Item - see this answer.

    • Even clearing that hurdle is typically not enough, and the following also affects PowerShell (Core) up to v7.3.x (but is no longer a problem in v7.4+ - see this answer):

      • When PowerShell captures output from external programs it currently invariably decodes them into .NET strings first - even when redirecting to a file - using the character encoding stored in [Console]::OutputEncoding, and then re-encodes on output, based on the specified or default character encoding of the cmdlet used.

      • If that encoding doesn't match the character encoding of what curl.exe outputs, the output will be misinterpreted - and given that [Console]::OutputEncoding defaults to the legacy system locale's OEM code page, misinterpretation will occur if the output is UTF-8-encoded, so - unless you've explicitly configured your system to use UTF-8 system-wide (see this answer) - you'll need to (temporarily) set [Console]::OutputEncoding = [Text.UTF8Encoding]::new() before the curl.exe call.

  • Preferably, let curl.exe itself write the output file, i.e. replace > result.json with -o result.json: this will save the output as-is to the target file.


Taking a step back:

  • You can make your curl.exe call directly from your batch file, which avoids the character-encoding problems.

  • cmd.exe's > operator, unlike PowerShell's is a raw byte conduit, so no data corruption should occur - that said, you're free to use -o result.json instead.

  • When calling from cmd.exe (a batch file), only "..." quoting is supported, so the '...' strings have been transformed into "..." ones below.

:: Directly from your batch file, using "..." quoting:
curl.exe -k -S -X GET -H "Version: 16.0" -H "Accept: application/json" "https://IntendedURL" > result.json
Sign up to request clarification or add additional context in comments.

2 Comments

Hi @mklement0, I have tried the method of changing ">" to "-o" and the method of calling curl.exe directly without the "powershell -Command". Both methods result in result.json of the format UTF-8 which seems to be what I want to achieve. However, when I open this UTF-8 result.json file in a normal browser, the exception of "SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data" still shows. And when I manually save the content of the file into new UTF-8 text file, the exception occurs too, which differs from when I saved the content from UTF-16 LE BOM manually.
@cyborg1234, so you now have a UTF-8 file without a BOM, but the file doesn't contain valid JSON, due to characters at the beginning? If you inspect the file visually in a text editor, does it look like valid JSON? Is ConvertFrom-Json able to parse it? Here's an example of content that would break JSON parsing: 'x' | ConvertFrom-Json

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.