3

I am trying to open some file and I know there are some errors in the file with UTF-8 encoding, so what I will do in python3 is

open(fileName, 'r', errors = 'ignore') 

but now I need to use python2, what are the corresponding way to do this?

Below is my code after changing to codecs

    with codecs.open('data/journalName1.csv', 'rU', errors="ignore") as file:
        reader = csv.reader(file)
        for line in reader:
            print(line) 

And file is here https://www.dropbox.com/s/9qj9v5mtd4ah8nm/journalName.csv?dl=0

2
  • is it possible to share the file? Commented Jun 8, 2015 at 1:35
  • It is not the problem with the file, a lot of file can cause error, I am just asking how to cope with the error. Commented Jun 8, 2015 at 2:14

2 Answers 2

8

Python 2 does not support this using the built-in open function. Instead, you have to uses codecs.

import codecs
f = codecs.open(fileName, 'r', errors = 'ignore')

This works in Python 2 and 3 if you decide you need to switch your python version in the future.

Sign up to request clarification or add additional context in comments.

2 Comments

It is still not correct, "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe6 in position 2028: invalid continuation byte" there is still error, I am going to add my code and upload file.
Actually I just copy the lines into another file, there is no error, but for this file, if I used the old way in python3, it can pass, but using codecs.open in python2, there is still error, please help me, thank you!
1

For UTF-8 encoded files I would suggest io module.

#!/usr/bin/python
# -*- coding: utf-8 -*-

import io

f=io.open('file.txt', 'r',  encoding='utf8')
s=f.read()
f.close()

2 Comments

some errors in the file with UTF-8 encoding means that it really itsn't a pure UTF-8 file.
My guess was the OP got an error: "UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)" or the like. It happens sometimes with UTF-8 decoded strings. Such an error is usually fixed by encoding in UTF-8, not ASCII.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.