2

I have a file containing

    foo = "Gro\xdfbritannien"

I'm using the following, but it always displays the original text with the \x

    import codecs
    f = codecs.open('myfile', 'r', 'utf8')
    for line in f:
      print line
      print line.encode('utf-8')
      print line.decode('utf-8')

I can't see how to display the proper encoded text, as when I'm doing

    >>> print u'Gro\xdfbritannien'
    Großbritannien

Any hint would be appreciated!

1
  • If your file literally has a quoted string with a backslash and an x in it, you'll need to parse the string literal with something like decode('string-escape'). Commented Feb 13, 2014 at 9:10

2 Answers 2

4

When your file contains the line

foo = "Gro\xdfbritannien"

it contains an actual backslash character, followed by x , d and f. So if that line is read into a Python string, it is read as

'foo = "Gro\\xdfbritannien"'

(and since those are all ASCII characters, it doesn't matter if you open it with the utf-8 codec or not).

So you need to decode it first using the string_escape codec:

>>> foo.decode("string_escape")
'Gro\xdfbritannien'

and then decode it to the correct Unicode object

>>> _.decode("latin1")
u'Gro\xdfbritannien'

which you can then print

>>> print _
Großbritannien
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks - works perfectly with print line.decode("string_escape").decode("latin1")
-1

There is no business of codec. You should do like this 'foo = "Gro\xdfbritannien"'

>>> print u'Gro\\xdfbritannien'
Gro\xdfbritannien

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.