1

So I have several large CSV files with several columns and rows (6000 odd rows and +-60 columns each) that I would like to split into seperate CSV files at a given string (number of lines between string differs), where each file is to be named the string that appears in the first row of the first column... for example:

Peter  B1  C1  D1
A2     B2  C2  D2
A3     B3  C3  D3
END    B4  C4  D4
Jack   B5  C5  D5
A6     B6  C6  D6
A7     B7  C7  D7
END    B8  C8  D8 
Billy  B9  C9  D9
A10    B10 C10 D10 
A11    B11 C11 D11
END    B12 C12 D12

so there should be 3 files named Peter, Jack and Billy, with the word END signalling that this is the last row to be written for this file. Peter contains range A1 (contains the word Peter) to D4; Jack A5 to D8 and Billy A9 to D12.

I have this so far:

use strict;
use warnings;

### INPUT
my $split_woord = 'END';       #word that signals file to be split
print "Input file: ";
my $file_name = <STDIN>;

my $input_file = "file locataion/$file_name.csv";

### OPEN
open (INPUT, ">", "$input_file") or die "Can't open $file_name: $!\n";

my $name= undef;

while (<INPUT>){

  my $line = $_;

  my ($a,$b,$c,$d)=split('\,', $line);

  until ($a eq $split_word){     #loop until column 1 reads 'END', then restart
    $name eq $a;                 #want to indictae first line

    my $output_file = "file_location/$name.csv";
    open (OUTPUT, ">>", "$output_file") or die "Can't create $output_file: $!\n";

    print OUTPUT "$a,$b,$c,$d\n";
    next;

    }

}

exit;

I can't seem to get it to loop properly, and am also struggling to use the first column/row to act as the name for the file. Any help will be tremendously appreciated!!! TIA

6
  • csplit is shell command. Have to tried it ?? Commented Oct 12, 2016 at 18:54
  • also please check SO other questions related stackoverflow.com/questions/8272017/… Commented Oct 12, 2016 at 18:56
  • 1
    Are you meaning to do an assignment here? $name eq $a; #want to indicate first line $name = $a; Commented Oct 12, 2016 at 18:57
  • Not an assignment no... trying to make my work life easier with large data files that I recieve so I don't have to struggle splitting them in excel... I had a look at the csplit - it doesn't give a solution to the file naming - but will give it a try nonetheless! Commented Oct 12, 2016 at 19:57
  • @DKru When jmcneirney says "assignment," he means the assignment operator, not a homework assignment. $name eq $a doesn't make any sense by itself (and you should be getting the warning "Useless use of string eq in void context"). Maybe you meant to assign $a to $name, i.e. $name = $a;? Commented Oct 12, 2016 at 20:54

1 Answer 1

2

First of all, your line:

open (INPUT, ">", "$input_file") 

Looks like it's opening a file for WRITING -- you wanted to read it, right?

If you're really dealing with a true CSV file, you may want to explore Text::CSV instead of splitting just on commas. It comes standard with all recent versions, and it handles the inevitable:

ID        Quote                Date
1         No, I'm fine         1/1/2016
2         Roger Winco          5/1/2016

That said, the real issue at hand...

Assuming the names don't repeat, you should be able to open an output filehandle and continue using it until it hits the terminating word:

my $OUTPUT;

open my $INPUT, '<', "$file_name.csv" or die;
while (<$INPUT>) {
  my ($a) = split /,/, $_, 2;

  if ($OUTPUT eq undef) {
    open $OUTPUT, '>', "$a.csv" or die;
  }

  print $OUTPUT $_;

  if ($a eq $split_woord) {
     close $OUTPUT;
     $OUTPUT = undef;        
  }
}
close $INPUT;
Sign up to request clarification or add additional context in comments.

2 Comments

"It comes standard with all recent versions" Unfortunately, this is not the case. Maybe you're thinking of Text::Balanced or Text::ParseWords?
@ThisSuitIsBlackNot -- I thought it was, but I must be mistaken. Thanks for the correction

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.