I'm having a problem with Python's string.format() and passing Unicode strings to it. This is similar to this older question, except that in my case the test code explodes on the print, not on the logging.info() call. Passing the same Unicode string object to a logging handler works fine.
This fails equally well with the older % formatting as well as string.format(). Just to make sure it was the string object that is the problem, and not print interacting badly with my terminal, I tried assigning the formatted string to a variable before printing.
def unicode_test():
byte_string = '\xc3\xb4'
unicode_string = unicode(byte_string, "utf-8")
print "unicode object type: {}".format(type(unicode_string))
output_string = "printed unicode object: {}".format(unicode_string)
print output_string
if __name__ == '__main__':
unicode_test()
The string object seems to assume it's getting ASCII.
% python -V
Python 2.7.2
% python ./unicodetest.py
unicode object type: <type 'unicode'>
Traceback (most recent call last):
File "./unicodetest.py", line 10, in <module>
unicode_test()
File "./unicodetest.py", line 6, in unicode_test
output_string = "printed unicode object: {}".format(unicode_string)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf4' in position 0: ordinal not in range(128)
Trying to cast output_string as Unicode doesn't make any difference.
output_string = u"printed unicode object: {}".format(unicode_string)
Am I missing something here? The documentation for the string object seems pretty clear that this should work as I'm attempting to use it.
printed unicode objectwithuworks for me (Python 2.6.5 and 2.7). Is the error you are getting when you do that the same one as listed above?'\xc3\xb4':ôorô?u.import sys; print sys.getdefaultencoding()