1

When running the following code (which just prints out file names):

print filename

It throws the following error:

File "myscript.py", line 78, in __listfilenames
print filename
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13: ordinal not in range(128)

So to fix this, I tried changing print filename to print filename.encode('utf-8') which didn't fix the problem.

The script only fails when trying read a filename such as Coé.jpg.

Any ideas how I can modify filename so the script continues to work when it comes acorss a special character?

NB. I'm a python noob

1 Answer 1

1

filename is already encoded. It is already a byte string and doesn't need encoding again.

But since you asked it to be encoded, Python first has to decode it for you, and it can only do that with the default ASCII encoding. That implicit decoding fails:

>>> 'Coé.jpg'
'Co\xc3\xa9.jpg'
>>> 'Coé.jpg'.decode('utf8')
u'Co\xe9.jpg'
>>> 'Coé.jpg'.decode('utf8').encode('utf8')
'Co\xc3\xa9.jpg'
>>> 'Coé.jpg'.encode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)

If you wanted encoded bytestrings, you don't have to do any encoding at all. Remove the .encode('utf8').

You probably need to read up on Python and Unicode. I recommend:

The rule of thumb is: decode as early as you can, encode as late as you can. That means when you receive data, decode to Unicode objects, when you need to pass that information to something else, encode only then. Many APIs can do the decoding and encoding as part of their job; print will encode to the codec used by the terminal, for example.

Sign up to request clarification or add additional context in comments.

5 Comments

But if I run the same script without .encoded(). bit, my script then gives me this error UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13: ordinal not in range(128)
@Nadine: perhaps. You could well have other errors in your code where you are mixing unicode and byte strings.
I've updated my question to better describe the problem as works correctly and only fails on special chars. I will review your updated answer and the links now
@Nadine: are you saying that the full traceback for your error now points to print filename and gives you a UnicodeDecodeError? I am very skeptical that that is the case.
You're right to be sckeptical, the issue is a lot deeper than I thought. However, you're links have massively helped me understand Python unicode and as such, will mark this as accepted.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.