0

I have a CSV file in the format shown below, and I'm using the Perl split command as shown, based on comma as delimiter. The problem is I have a quoted string "HTTP Large, GMS, ZMS: Large Files" with embedded commas and it fails. The array values will have only less elements. How can I modify the split command.

  my @values = split('\,', $line);

CSV File

 10852,800 Mob to Int'l,235341739,573047,84475.40,0.0003,Inbound,Ber unit
 10880,"HTTP Large, GMS, ZMS: Large Files",52852810,128,13712.68,0.0002,,Rer unit
 13506,Presence National,2716766818,2447643,309116.40,0.0001,Presence,per Cnit
2
  • 6
    Your question begs the question - why not use (e.g.) the Text::CSV module instead, which handles this sort of gotcha for you? Commented Mar 23, 2012 at 4:58
  • 2
    One lesson all programmers should learn: Never parse CSV or HTML all by yourself. Use the existing modules, they are usually mature, stable and well tested. Commented Mar 23, 2012 at 10:25

2 Answers 2

4

Issues like embedded commas are precisely why modules such as Text::CSV were created. If, but only if, the data does not have embedded commas, then you can make regular expressions work. When the data has embedded commas, it is time to move to a tool designed to handle CSV with embedded commas, and that would be Text::CSV in Perl (and its relatives Text::CSV_PP and Text::CSV_XS).

Sign up to request clarification or add additional context in comments.

2 Comments

which i need to use Text::CSV_PP or Text::CSV_XS? What is the difference? Will it work on perl, v5.8.7 built for sun4-solaris-64-ld
You use and install Text::CSV; it comes with the pure Perl implementation, Text::CSV_PP (the _PP suffix indicates 'pure Perl', not needing a C compiler). Then to wring the most performance out of your system, you install Text::CSV_XS, which uses the Perl extension mechanism and C code functions to implement a higher-speed version of the same code. Text::CSV has been around since before there was Perl 5.8; it will work fine with 5.8.7. The current maintainer's first release was in 2007, though. (Searching through my private archives, I found Text-CSV-0.01.tar.gz from July 1997.)
1

I have also used the same approach as yours and it works fine with me. Try this code.

my @values = split(/(?<="),(?=")/, $line);

hope it helps

2 Comments

Your code breaks on the OP's data, and on this: 1234,"What happens when you have a embedded "","" in your file?","Does it break?" If you just use Text::CSV module you know it will return the correct resutls.
@Ven'Tatsu I got your point but I only suggested my option based on he's example and code. Correction based on his question which is split.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.