0

For the life of me, I cannot figure this out: I am just trying to extract messages and who said them from a .json file. While I cannot disclose those data here, this is the line that does it:

print '<%s> %s' % (x['sender_id'], x['content'][0]['text'])

"x" is the dict containing the things I need to know. The output on each line is to look like so:

<username> The quick brown fox jumps over the lazy dog.

as seen in many IRC logs. Anyway, both of the strings in the tuple there are Unicode. That is to say they are formally of the Python unicode type. I checked. However when I try to format them into that string, the result is always something like:

UnicodeEncodeError: 'ascii' codec can't encode character u'\U0001f52b' in position 26: ordinal not in range(128)

I have tried many things, such as writing this instead:

print u'<%s> %s' % (x['sender_id'], x['content'][0]['text'])

Or:

print '<%s> %s' % (x['sender_id'], x['content'][0]['text']).encode('utf-8')

and I have tried combining those two strategies, and other things besides, but nothing I have tried works. What am I doing wrong?

1 Answer 1

1

It's probably print writing to stdout with an ASCII encoding, which is causing the problem. Check the value of sys.stdout.encoding to be sure. Either make sure you only print ASCII strings or set the default stdout encoding to something more reasonable like UTF-8 with the PYTHONIOENCODING env variable. Example:

$ PYTHONIOENCODING=utf-8 python myprogram.py
Sign up to request clarification or add additional context in comments.

3 Comments

The plot has thickened actually. I noticed that it was only this one character that was a problem, namely: 🔫. The others went through without complaints. Also your advice didn't work. Thank you anyway. I'm sort of beginning to believe that there might be either a bug in Python or a malformed or peculiar / proprietary Unicode character here. Apparently it's supposed to be an image of a pistol: iemoji.com/view/emoji/376/events/pistol-or-revolver
What is sys.stdout.encoding?
By default, None. I just looked through bash history and realized that I had written PYTHONENCODING instead of PYTHONIOENCODING. It works now.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.