How to run two commands with different inputs with GNU Parallel?

Question

I have two programs that have different proposes and they are called as follows:

./FolderCounter <PATH TO FOLDER> traceX
./VideoCounter <PATH TO VIDEO> traceY

Running these applications I have the following commands with GNU parallel:

parallel ./FolderCounter {} trace3 ::: $(cat PatinN_files.txt) &> data_output/Result_PatinN_files.txt
parallel ./FolderCounter {} trace5 ::: $(cat PatinS_files.txt) &> data_output/Result_PatinS_files.txt
parallel ./VideoCounter  {} trace3 ::: $(cat PatinN_videos.txt) &> data_output/Result_PatinN_video.txt
parallel ./VideoCounter  {} trace5 ::: $(cat PatinS_videos.txt) &> data_output/Result_PatinS_video.txt

My goal is combine these four lines into a single GNU parallel command, so that it can better manage the number of parallel jobs and start the next batch of files as soon as there are processors available.

How can I do that?

Ole Tange · Accepted Answer · 2016-09-08 20:01:58Z

First: Don't do:

parallel ... ::: $(cat foo)

Do:

parallel ... :::: foo

In most cases this will do what you want whereas the first may cause problems if it contains lines with spaces.

I assume that PatinN_files.txt has the same number of lines as PatinN_videos.txt.

Normally I would do 2 runs: a trace3-run and a trace5 run:

parallel ./FolderCounter {1} trace3 ";" ./VideoCounter {2} trace3  ::::+ PatinN_files.txt PatinN_videos.txt &> data_output/Result_PatinN.txt
parallel ./FolderCounter {1} trace5 ";" ./VideoCounter {2} trace5  ::::+ PatinS_files.txt PatinS_videos.txt &> data_output/Result_PatinS.txt

Alternatively you can simply use GNU Parallel to first generate all the commands to run and then run them (this does not require the txt-files to have the same number of lines):

(
 parallel --dry-run ./FolderCounter {} trace3 :::: PatinN_files.txt
 parallel --dry-run ./FolderCounter {} trace5 :::: PatinS_files.txt
 parallel --dry-run ./VideoCounter  {} trace3 :::: PatinN_videos.txt
 parallel --dry-run ./VideoCounter  {} trace5 :::: PatinS_videos.txt
) | parallel &> data_output/Result.txt

To track which input generates what output, use:

) | parallel --tag &> data_output/Result.txt

To get the log output into 4 different files is a bit harder. If that is really needed it can be done, but is not as elegant as the above.

If you simply want to run the jobs if there is spare cpus sitting idle, you can use --load 100%:

parallel --load 100% ./FolderCounter {} trace3 ::: $(cat PatinN_files.txt) &> data_output/Result_PatinN_files.txt &
parallel --load 100% ./FolderCounter {} trace5 ::: $(cat PatinS_files.txt) &> data_output/Result_PatinS_files.txt &
parallel --load 100% ./VideoCounter  {} trace3 ::: $(cat PatinN_videos.txt) &> data_output/Result_PatinN_video.txt &
parallel --load 100% ./VideoCounter  {} trace5 ::: $(cat PatinS_videos.txt) &> data_output/Result_PatinS_video.txt &
wait

It will start a job if the instant load is less than the number of cpus.

Collectives™ on Stack Overflow

How to run two commands with different inputs with GNU Parallel?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related