1

I am reading a text file containing a single word B\xc3\xa9zier.

I wish to convert this to its equivalent decoded utf-8 form i.e. Bézier and print it to console.

My code is as follows:

foo=open("test.txt")  
for line in foo.readlines():  
    for word in line.split():  
        print(word.decode('utf-8'))
foo.close()

the output is:

B\xc3\xa9zier

However if i do something like this:

>>> print('B\xc3\xa9zier'.decode('utf-8'))

I get the correct output:

Bézier

I am unable to figure out why this is happening?

1

1 Answer 1

6

It seems as though you have a raw utf8 escaped string in the file, use string_escape to decode it instead

with open('test.txt') as f:
    for line in f:
        for word in line.split():
            print(word.decode('string_escape').decode('utf-8'))


Bézier
Sign up to request clarification or add additional context in comments.

9 Comments

Thanks for response .Could You please help me to figure out what is wrong with my code.
@Chauhan you should use string_escape encoding instead of utf-8 because you want to unescape a unicode string, not decode utf8.
@jamylak Now I am getting Bézier instead of Bézier
@Chauhan Then you need word.decode('string_escape').decode('utf-8')
@JanneKarila Thanks a lot .It works.Could you please explain it .
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.