2

I'm trying to use Text::CSV to parse this CSV file. Here is how I am doing it:

open my $fh, '<', 'test.csv' or die "can't open csv";
my $csv = Text::CSV_XS->new ({ sep_char => "\t", binary => 1 , eol=> "\n"});
$csv->column_names($csv->getline($fh));

while(my $row = $csv->getline_hr($fh)) {
    # use row
}

Because the file has 169,252 rows (not counting the headers line), I expect the loop to run that many times. However, it only runs 8 times and gives me 8 rows. I'm not sure what's happening, because the CSV just seems like a normal CSV file with \n as the line separator and \t as the field separator. If I loop through the file like this:

while(my $line = <$fh>) {
    my $fields = $csv->parse($line);
}

Then the loop goes through all rows.

3
  • Take a good look at the the file in a text editor with Show All Characters/Show Whitespace turned on, at about the 8th or 9th row. Perhaps there's an extra tab, or missing tab, or unescaped one. I would also try and import the CSV file into a spreadsheet just to check all the columns are correctly set out. Commented Aug 10, 2016 at 1:49
  • Can you show a few consecutive rows, some of which are OK and some which are not? (Are the good ones the first eight, or some 'random' ones?) After you've checked for missing stuff, per comment by Mathew Lock. Commented Aug 10, 2016 at 1:54
  • Use Data::Dumper and dump each row to see what the last line it processes is Commented Aug 10, 2016 at 3:24

1 Answer 1

7

Text::CSV_XS is silently failing with an error. If you put the following after your while loop:

 my ($cde, $str, $pos) = $csv->error_diag ();
 print "$cde, $str, $pos\n";

You can see if there were errors parsing the file and you get the output:

2034, EIF - Loose unescaped quote, 336

Which means the column:

GT New Coupe 5.0L CD Wheels: 18" x 8" Magnetic Painted/Machined 6 Speakers

has an unquoted escape string (there is no backslash before the ").

The Text::CSV perldoc states:

allow_loose_quotes

By default, parsing fields that have quote_char characters inside an unquoted field, like

1,foo "bar" baz,42

would result in a parse error. Though it is still bad practice to allow this format, we cannot help there are some vendors that make their applications spit out lines styled like this.

If you change your arguments to the creation of Text::CSV_XS to:

my $csv = Text::CSV_XS->new ({ sep_char => "\t", binary => 1,
    eol=> "\n", allow_loose_quotes => 1 });

The problem goes away, well until row 105265, when Error 2023 rears its head:

2023, EIQ - QUO character not allowed, 406

Details of this error in the perldoc:

2023 "EIQ - QUO character not allowed"

Sequences like "foo "bar" baz",qu and 2023,",2008-04-05,"Foo, Bar",\n will cause this error.

Setting your quote character empty (setting quote_char => '' on your call to Text::CSV_XS->new()) does seem to work around this and allow processing of the whole file. However I would take time to check if this is a sane option with the CSV data.

TL;DR The long and short is that your CSV is not in the greatest format, and you will have to work around it.

Sign up to request clarification or add additional context in comments.

2 Comments

Sorry for the late accept, but thanks for the great and detailed response. I wish in Text::CSV the default was to fail explicitly with a silent fail option. Thanks!
Thank you. I couldn't for the life of me figure out why my code was suddenly failing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.