0

I have a CSV file where each row looks something like this:

509,,SOME VALUE,0,1,1,0.23

I am attempting to find all numbers that are two or more digits that may or may not be followed or preceded by a comma and then put them in an array by using this Perl code:

my $file ='somefile.csv';

open my $DATA , "<", $file;
$_ = do {local $/; <$DATA>};
my @A = /,?(\d{2,}),?/g;
close $DATA;

As expected it is matching the first comma separated value in the row above but also it is matching the 23 portion of the last value, 0.23. I would expect this not to match because of the ..

Could someone help me with making my regex more specific so it will not find the numbers before or after the period too?

2
  • 1
    The comma before the number is optional. If you want comma or string beginning, use (?:^|,) Commented Nov 6, 2013 at 16:59
  • You should read this. Commented Nov 6, 2013 at 17:34

1 Answer 1

2

It is often unwise to press regular expreesions into doing too much in a program. It is easy to end up with convoluted and incomprehensible code that could have been implemented much more simply with standard Perl.

Slurping the whole file into memory at once also makes this problem more awkward than it needs to be. Reading the file line by line is usually the best and most efficient way.

I suggest you write something like this. It reads each line, trims the newline from the end, and uses split to separate it into fields. Then all those fields that match your criterion - two or more decimal digits - are filtered out using grep and pushed onto the array @numbers.

use strict;
use warnings;

my $file ='somefile.csv';

open my $data , '<', $file;
my @numbers;
while (<$data>) {
  chomp;
  push @numbers, grep /^\d{2,}$/, split /,/;
}
close $data;

print "$_\n" for @numbers;

output

509

If you insist on following your current plan, then this alternative program will also work. But I hope you see that it is far less clear than my first suggestion.

use strict;
use warnings;

my $file ='somefile.csv';

my $data = do {
  open my $fh, '<', $file;
  local $/;
  <$fh>;
};

my @numbers = $data =~ /(?:,|^)\K(\d{2,})(?=,|$)/gm;
print "$_\n" for @numbers;
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.