1

I'm using pandas.read_csv to read a tab delimited file and am running into the error: Error tokenizing data. C error: Expected 364 fields in line 73058, saw 398

After much searching, it seems that the offending entry is: "– SO ,쳌 \\ ?Œ  ø ,d -L ,ú ,‚ ZO

Removing the quotation mark seems to solve things. I've got a lot of large files with a lot of strange characters in them, so this will no doubt repeat itself. Do I need to remove single quotation marks ahead of time or is there some way around this?

1 Answer 1

4

There is a quoting argument for read_csv:

quoting : int or csv.QUOTE_* instance, default None
    Control field quoting behavior per ``csv.QUOTE_*`` constants. Use one of
    QUOTE_MINIMAL (0), QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or QUOTE_NONE (3).
    Default (None) results in QUOTE_MINIMAL behavior.

These are described in the csv docs.

Try setting quoting=3 (i.e. QUOTE_NONE).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.