0

I'm running a perl script consisting of 30 threads to run a subroutine. For each thread, I'm supplying 100 data. In the subroutine, after the code does what its supposed to, I'm storing the output in a csv file. However, I find that on execution, the csv file has some data overlapped. For example, in the csv file, I'm storing name, age, gender, country this way-

print OUTPUT $name.",".$age.",".$gender.",".$country.",4\n";

The csv file should have outputs as such-

Randy,35,M,USA,4
Tina,76,F,UK,4

etc.

However, in the csv file, I see that some columns has overlapped or has been entered haphazardly in this way-

Randy,35,M,USA,4
TinaMike,76,UK
23,F,4

Is it because some threads are executing at the same time? What could I do to avoid this? I'm using the print statement only after I'm getting the data. Any suggestions?

4 is the group id which will remain constant.

Below is the code snippet:

#!/usr/bin/perl

use DBI;
use strict;
use warnings;
use threads;
use threads::shared;

my $host = "1.1.1.1";
my $database = "somedb";
my $user = "someuser";
my $pw = "somepwd";

my @threads;


open(PUT,">/tmp/file1.csv") || die "can not open file";
open(OUTPUT,">/tmp/file2.csv") || die "can not open file";

my $dbh = DBI->connect("DBI:mysql:$database;host=$host", $user, $pw ,) || die "Could not connect to database: $DBI::errstr";
$dbh->{'mysql_auto_reconnect'} = 1;

my $sql = qq{
    //some sql to get a primary keys
};

my $sth = $dbh->prepare($sql);
$sth->execute();
while(my @request = $sth->fetchrow_array())
{
#get other columns and print to file1.csv
            print PUT $net.",".$sub.",4\n";
            $i++; #this has been declared before
}


for ( my $count = 1; $count <= 30; $count++) {
        my $t = threads->new(\&sub1, $count);
        push(@threads,$t);
}
foreach (@threads) {
        my $num = $_->join;
        print "done with $num\n";
}

sub sub1 {
        my $num = shift;

        //calculated start_num and end_num based on an internal logic

        for(my $x=$start_num; $x<=$end_num; $x++){

                print OUTPUT $name.",".$age.",".$gender.",".$country.",4\n";
                $j++; #this has been declared before
            }

        sleep(1);
        return $num;
}

I have problem in the file2 which has the OUTPUT handler

2
  • you should lock file before printing to it, and unlock when done. But the simplest way, is print to STDOUT, and then just redirect output to new csv file Commented Aug 13, 2014 at 13:48
  • 1
    Printing to STDOUT won't help - you'll hit the same problems. Each of your threads are completely separate processes printing to the same filehandle. There's nothing that guarantees they won't interrupt each other with their 'print' statements. Commented Aug 13, 2014 at 14:28

2 Answers 2

4

You are multithreading and printing to a file from multiple threads. This will always end badly - print is not an 'atomic' operation, so different prints can interrupt each other.

What you need to do is serialize your output such that this cannot happen. The simplest way is to use a lock or a semaphore:

    my $print_lock : shared;

    { 
        lock $print_lock; 
        print OUTPUT $stuff,"\n";
    }

when the 'lock' drifts out of scope, it'll be released.

Alternatively, have a separate thread that 'does' file IO, and use Thread::Queue to feed lines to it. Depends somewhat on whether you need any ordering/processing of the contents of 'OUTPUT'.

Something like:

    use Thread::Queue;

    my $output_q = Thread::Queue -> new();


    sub output_thread {
      open ( my $output_fh, ">", "output_filename.csv" ) or die $!; 

       while ( my $output_line = $output_q -> dequeue() ) {
          print {$output_fh} $output_line,"\n"; 
       }

       close ( $output_fh );


     sub doing_stuff_thread {
        $output_q -> enqueue ( "something to output" );  #\n added by sub!
     }


     my $output_thread = threads -> create ( \&output_thread );
     my $doing_stuff_thread = threads -> create ( \&doing_stuff_thread );

     #wait for doing_stuff to finish - closing the queue will cause output_thread to flush/exit. 
     $doing_stuff_thread -> join();
     $output_q -> end;
     $output_thread -> join();
Sign up to request clarification or add additional context in comments.

3 Comments

I don't have any idea on locks. Could you refer some sites or snippets where I could have an understanding of locks?
thank you. I'll research a little more then as these are new to me :)
You may also want to look at autoflush as that may become relevant: perldoc.perl.org/IO/Handle.html - part of the problem is that - by default, file IO is buffering for efficiency. That can cause problems when multithreading. (But not if you just have one thread handling the IO, as then it's irrelevant).
2

Open the File handle globally, then try using flock on the file handle as demonstrated:

sub log_write {
    my $line = shift;
    flock(OUTPUT, LOCK_EX)      or die "can't lock: $!";
    seek(OUTPUT, 0, SEEK_END)   or die "can't fast forward: $!";
    print OUTPUT $line;
    flock(OUTPUT, LOCK_UN)      or die "can't unlock: $!";
}

Other example:

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.