For the life of me, I cannot figure this out: I am just trying to extract messages and who said them from a .json file. While I cannot disclose those data here, this is the line that does it:
print '<%s> %s' % (x['sender_id'], x['content'][0]['text'])
"x" is the dict containing the things I need to know. The output on each line is to look like so:
<username> The quick brown fox jumps over the lazy dog.
as seen in many IRC logs. Anyway, both of the strings in the tuple there are Unicode. That is to say they are formally of the Python unicode type. I checked. However when I try to format them into that string, the result is always something like:
UnicodeEncodeError: 'ascii' codec can't encode character u'\U0001f52b' in position 26: ordinal not in range(128)
I have tried many things, such as writing this instead:
print u'<%s> %s' % (x['sender_id'], x['content'][0]['text'])
Or:
print '<%s> %s' % (x['sender_id'], x['content'][0]['text']).encode('utf-8')
and I have tried combining those two strategies, and other things besides, but nothing I have tried works. What am I doing wrong?