Let's say I have a Python 3 source file in cp1251 encoding with the following content:
# эюяьъ (some Russian comment)
print('Hehehey')
If I run the file, I'll get this:
SyntaxError: Non-UTF-8 code starting with '\xfd' in file ... on line 1 but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
That's clear and expected - I understand that, in general, cp1251 byte sequence can't be decoded with UTF-8, which is a default encoding in Python 3.
But if I edit the file as follows:
# coding: utf-8
# эюяьъ (some Russian comment)
print('Hehehey')
everything will work fine.
And that is pretty confusing.
In the 2nd example I still have in the source the same cp1251 byte sequence, which is not valid in UTF-8, and I expect the compiler should use the same encoding (UTF-8) for preprocessing the file and terminate with the same error.
I have read PEP 263 but still don't get the reason it doesn't happen.
So, why my code works in the 2nd case and terminates in the 1st?
UPD.
In order to check whether my text editor is smart enough to change the file's encoding because of the line # coding: utf-8, let's look at the actual bytes:
(1st example)
23 20 fd fe ff fa fc ...
(2nd example)
23 20 63 6f 64 69 6e 67 3a 20 75 74 66 2d 38 0a
23 20 fd fe ff fa fc ...
These f-bytes are for cyrillic letters in cp1251 and they are not valid in UTF-8.
Furhermore, if I edit the source this way:
# coding: utf-8
# эюяъь (some Russian comment)
print('Hehehey')
print('эюяъь')
I'll face the error:
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xfd ...
So, unfortunately my text editor isn't so smart.
Thus, in the above examples the source file is not converted from cp1251 to UTF-8.
# coding: utf-8means in a Python file. Take a hex editor and look at the actual bytes in the file.# coding: utf-8there are non UTF-8 bytes. For example, in both cases I have the '\xfd' byte which is a cp1251 'э' and which cause an error in the 1st example. So, there are must be a different explanation.iconvto convert it from UTF-8 to cp1251, so no editor was involved. The behavior was exactly as the OP describes: Acoding:declaration, even one that just declares the implicit UTF-8 decoding explicitly, silenced the error, even when the file contained non-UTF-8 bytes, while failing to providecoding:declaration triggered the error. This is real, not an editor artifact.