Python encoding issue in reading from text file

Question

I am reading a text file containing a single word B\xc3\xa9zier.

I wish to convert this to its equivalent decoded utf-8 form i.e. Bézier and print it to console.

My code is as follows:

foo=open("test.txt")  
for line in foo.readlines():  
    for word in line.split():  
        print(word.decode('utf-8'))
foo.close()

the output is:

B\xc3\xa9zier

However if i do something like this:

>>> print('B\xc3\xa9zier'.decode('utf-8'))

I get the correct output:

Bézier

I am unable to figure out why this is happening?

possible duplicate of Unicode (utf8) reading and writing to files in python — jamylak
– jamylak, Commented Jun 4, 2013 at 11:22

jamylak · Accepted Answer · 2013-06-04 11:37:21Z

6

It seems as though you have a raw utf8 escaped string in the file, use string_escape to decode it instead

with open('test.txt') as f:
    for line in f:
        for word in line.split():
            print(word.decode('string_escape').decode('utf-8'))


Bézier

edited Jun 4, 2013 at 11:37

user2374515

answered Jun 4, 2013 at 11:11

jamylak

134k30 gold badges238 silver badges240 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

user2374515 Over a year ago

Thanks for response .Could You please help me to figure out what is wrong with my code.

kirelagin Over a year ago

@Chauhan you should use string_escape encoding instead of utf-8 because you want to unescape a unicode string, not decode utf8.

user2374515 Over a year ago

Janne Karila Over a year ago

@Chauhan Then you need word.decode('string_escape').decode('utf-8')

user2374515 Over a year ago

@JanneKarila Thanks a lot .It works.Could you please explain it .

|

Collectives™ on Stack Overflow

Python encoding issue in reading from text file

1 Answer 1

9 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related