1

When on Windows 10 I open a certain file in a Visual Studio Code, and then edit and save the file, the VSC seems to replace certain characters with another characters so that some text in the saved file looks corrupted as shown on the picture below. The default character encoding used in the VSC is UTF-8.

Non-corrupted string before saving the file:
“Diff Clang Compiler Log Files” enter image description here

Corrupted string after saving the file:
�Diff Clang Compiler Log Files� enter image description here

So for example the double quotation mark character " which in the original file is represtented by byte string 0xE2 0x80 0x9C upon saving the file will be converted into 0xEF 0xBF 0xBD. I do not fully understand what the root cause is, but I do have the following assumption:

  1. The original file is saved using the Windows-1252 Encoding (I am using Win 10 machine, German keyboard)
  2. VSC faulty interprets the file with UTF-8 encoding
  3. Characters codes get converted from Windows-1252 into UTF-8 once the file is saved, thus 0xE2 0x80 0x9C becomes 0xEF 0xBF 0xBD.

Is my understanding corrrect?

Can I somehow detect (through powershell or python code) whether a file uses Windows-1252 or UTF-8 encoding? Or there is no definite way to determine that? I would really be glad to find a way on how to avoid corrupting my files in the future :-).

Thank you!

3
  • Switch to UTF-8 Everywhere Commented Jan 18, 2023 at 15:17
  • Is there a way to detect whether a file is encoded with CP1252 or UTF-8? Commented Jan 19, 2023 at 8:04
  • 1
    Check the following: stackoverflow.com/… Commented Jan 19, 2023 at 12:16

1 Answer 1

1

The encoding of the file can be found with the help of python magic module

import magic
    
FILE_PATH = 'C:\\myPath'

def getFileEncoding (filePath):
        
        blob = open(filePath, 'rb').read()
        m = magic.Magic(mime_encoding=True)
        fileEncoding = m.from_buffer(blob)
        
        return fileEncoding
        
fileEncoding = getFileEncoding ( FILE_PATH )
print (f"File Encoding: {fileEncoding}")
Sign up to request clarification or add additional context in comments.

1 Comment

+1 for finding python-magic, which I hadn't heard of. Looks like it might even work on Windows: see "You'll need DLLs for libmagic. @julian-r maintains a pypi package with the DLLs, you can fetch it with: pip install python-magic-bin" at https://github.com/ahupp/python-magic

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.