0

I'm reading a utf-8 encoded file. When I print the text directly, everything is fine. When i print the text from a class using msg.__str__() it works too. But I really don't know how to print it only with str(msg) because this will always raise the error "'ascii' codec can't encode character u'\xe4' in position 10: ordinal not in range(128)" if in the text is a umlaut.

Example Code:

 #!/usr/bin/env python
 # encoding: utf-8

 import codecs from TempClass import TempClass

 file = codecs.open("person.txt", encoding="utf-8") message =
 file.read() #I am Mr. Händler.

 #works
 print message

 msg = TempClass(message)
 #works
 print msg.__str__()
 #works
 print msg.get_string()

 #error
 print str(msg)

And the class:

class TempClass(object):

def __init__(self, text):
    self.text = text

def get_string(self):
    return self.text

def __str__(self):
    return self.text

I tried to decode and encode the text in several ways but nothing works for me.

Help? :)

Edit: I am using Python 2.7.9

5
  • why would you want to do this? print u'\xe4' -> ä Commented Mar 27, 2015 at 20:26
  • Why do you need to use str(msg)? msg is already a string. I can't seem to reproduce the problem (Python 3.4.2), either. Commented Mar 27, 2015 at 20:31
  • 1
    @TigerhawkT3, the OP is using python2 which is very different to python3, also the type is unicode not str Commented Mar 27, 2015 at 20:31
  • @Matthias D. it works when you call the methods because you are passing a unicode str to your class not trying to decode to ascii as you are using str(message) Commented Mar 27, 2015 at 20:35
  • 1
    in python 2 this can be very helpful from __future__ import unicode_literals Commented Mar 27, 2015 at 20:42

1 Answer 1

1

Because message (and msg.text) are not str but unicode objects. To call str() you need to specify utf-8 as the encoding again. Your __str__ method should look like:

def __str__(self):
    return self.text.encode('utf-8')

unicode can be implicitly encoded to str if it contains only ASCII characters, which is why you only see the error when the input contains an umlaut.

Sign up to request clarification or add additional context in comments.

2 Comments

When i change it, the umlauts will not be printed. They will be printed, if I also change the print line to str(msg).decode('utf-8'). That's a strange way.
It worked for me, but my terminal encoding is UTF-8. Maybe yours isn't? Your extra .decode('utf-8') is converting it back to unicode, which would then get encoded properly for your terminal. Is there a reason for not using unicode (or Python 3)? str == bytes in Python 3, which means if you use it for text you're implicitly assuming ASCII encoding (which breaks when you have non-ASCII text).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.