UnicodeDecodeError - Error while reading the file

Question

I am getting a UnicodeDecodeError when reading a file that has non-ascii characters. Here is the snippet of code

import codecs
import locale

print locale.getpreferredencoding()

fname = "c:\\testing\nonascii.txt"
f=codecs.open(fname,"r",encoding='utf-8')
sfile=f.read()


print type(file) #it's unicode


print sfile.encode('utf-8')


print type(sfile.encode('utf-8'))

Also give us the error, and where you are getting the error. — Anand S Kumar
– Anand S Kumar, Commented Sep 17, 2015 at 1:15

Mark Ransom · Accepted Answer · 2015-09-17 02:16:47Z

1

Judging by the filename, you're using Windows. Files on Windows will not be UTF-8 encoded unless you take special care to save them that way; by default they will use your code page.

If you don't know what code page Windows is using, you can use the special encoding mbcs to get what it uses for a default. If you want your program to work on other systems besides Windows, you can use sys.getfilesystemencoding() to get a value that should work on the current system; on Windows it will return mbcs.

import sys
f=codecs.open(fname,"r",encoding=sys.getfilesystemencoding())

edited Sep 17, 2015 at 2:16

answered Sep 17, 2015 at 1:53

Mark Ransom

310k44 gold badges423 silver badges660 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mhawke · Accepted Answer · 2015-09-17 01:53:56Z

0

Your file is not really UTF-8.

One possiblity is that it is UTF-16 with a Byte Order Mark. If this is the problem, your error will be one of:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xfe in position 0: invalid start byte

or

UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte

depending on the endianess of the file.

There are other possible encodings that might be in use. If you post the actual traceback we might be able to tell more definitively.

answered Sep 17, 2015 at 1:53

mhawke

87.5k10 gold badges122 silver badges142 bronze badges

Collectives™ on Stack Overflow

UnicodeDecodeError - Error while reading the file

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related