UnicodeEncodeError when formatting string with % in Python

Question

For the life of me, I cannot figure this out: I am just trying to extract messages and who said them from a .json file. While I cannot disclose those data here, this is the line that does it:

print '<%s> %s' % (x['sender_id'], x['content'][0]['text'])

"x" is the dict containing the things I need to know. The output on each line is to look like so:

<username> The quick brown fox jumps over the lazy dog.

as seen in many IRC logs. Anyway, both of the strings in the tuple there are Unicode. That is to say they are formally of the Python unicode type. I checked. However when I try to format them into that string, the result is always something like:

UnicodeEncodeError: 'ascii' codec can't encode character u'\U0001f52b' in position 26: ordinal not in range(128)

I have tried many things, such as writing this instead:

print u'<%s> %s' % (x['sender_id'], x['content'][0]['text'])

Or:

print '<%s> %s' % (x['sender_id'], x['content'][0]['text']).encode('utf-8')

and I have tried combining those two strategies, and other things besides, but nothing I have tried works. What am I doing wrong?

Benjamin Peterson · Accepted Answer · 2013-08-18 22:47:46Z

1

It's probably print writing to stdout with an ASCII encoding, which is causing the problem. Check the value of sys.stdout.encoding to be sure. Either make sure you only print ASCII strings or set the default stdout encoding to something more reasonable like UTF-8 with the PYTHONIOENCODING env variable. Example:

$ PYTHONIOENCODING=utf-8 python myprogram.py

answered Aug 18, 2013 at 22:47

Benjamin Peterson

20.8k6 gold badges36 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

readyready15728 Over a year ago

The plot has thickened actually. I noticed that it was only this one character that was a problem, namely: 🔫. The others went through without complaints. Also your advice didn't work. Thank you anyway. I'm sort of beginning to believe that there might be either a bug in Python or a malformed or peculiar / proprietary Unicode character here. Apparently it's supposed to be an image of a pistol: iemoji.com/view/emoji/376/events/pistol-or-revolver

Benjamin Peterson Over a year ago

What is sys.stdout.encoding?

readyready15728 Over a year ago

By default, None. I just looked through bash history and realized that I had written PYTHONENCODING instead of PYTHONIOENCODING. It works now.

Collectives™ on Stack Overflow

UnicodeEncodeError when formatting string with % in Python

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related