0

I am using the code below to read a csv file into a dataframe. However, I get the error pandas.parser.CParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2 and hence I changed pd.read_csv('D:/TRYOUT.csv') to pd.read_csv('D:/TRYOUT.csv', error_bad_lines=False) as suggested here. However, I now get the error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 1: invalid continuation byte in the same line.

def ExcelFileReader():
    mergedf = pd.read_csv('D:/TRYOUT.csv', error_bad_lines=False)
    return mergedf
2
  • Could you supply an example CSV file which causes a failure? Commented Aug 10, 2015 at 22:26
  • This question is similar to: UnicodeDecodeError when reading CSV file in Pandas. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. Commented Sep 24 at 11:10

3 Answers 3

1

If you're on Windows, you probably need to use pd.read_csv(filename, encoding='latin-1')

Sign up to request clarification or add additional context in comments.

Comments

0

I had a similar problem and had to use

utf-8-sig 

as the encoding,

The reason i used utf-8-sig is because if you do ever get non-Latin characters it wont be able to deal with it correctly. There are a few ways of getting around the problem, but i guess you can just choose the best that suits your needs.

Hope that helps.

Comments

0

If you would like to exclude the rows providing error and ignore the malformed data then you need to use:

pd.read_csv(file_path, encoding="utf8", error_bad_lines=False, encoding_errors="ignore")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.