Pandas error tokenizing data when field in csv file contains quotation mark

Question

I'm using pandas.read_csv to read a tab delimited file and am running into the error: Error tokenizing data. C error: Expected 364 fields in line 73058, saw 398

After much searching, it seems that the offending entry is: "– SO ,쳌 \\ ?Œ ø ,d -L ,ú ,‚ ZO

Removing the quotation mark seems to solve things. I've got a lot of large files with a lot of strange characters in them, so this will no doubt repeat itself. Do I need to remove single quotation marks ahead of time or is there some way around this?

Andy Hayden · Accepted Answer · 2014-02-06 00:59:50Z

4

There is a quoting argument for read_csv:

quoting : int or csv.QUOTE_* instance, default None
    Control field quoting behavior per ``csv.QUOTE_*`` constants. Use one of
    QUOTE_MINIMAL (0), QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or QUOTE_NONE (3).
    Default (None) results in QUOTE_MINIMAL behavior.

These are described in the csv docs.

Try setting quoting=3 (i.e. QUOTE_NONE).

answered Feb 6, 2014 at 0:59

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas error tokenizing data when field in csv file contains quotation mark

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related