1

I have two programs that have different proposes and they are called as follows:

./FolderCounter <PATH TO FOLDER> traceX
./VideoCounter <PATH TO VIDEO> traceY

Running these applications I have the following commands with GNU parallel:

parallel ./FolderCounter {} trace3 ::: $(cat PatinN_files.txt) &> data_output/Result_PatinN_files.txt
parallel ./FolderCounter {} trace5 ::: $(cat PatinS_files.txt) &> data_output/Result_PatinS_files.txt
parallel ./VideoCounter  {} trace3 ::: $(cat PatinN_videos.txt) &> data_output/Result_PatinN_video.txt
parallel ./VideoCounter  {} trace5 ::: $(cat PatinS_videos.txt) &> data_output/Result_PatinS_video.txt

My goal is combine these four lines into a single GNU parallel command, so that it can better manage the number of parallel jobs and start the next batch of files as soon as there are processors available.

How can I do that?

1 Answer 1

1

First: Don't do:

parallel ... ::: $(cat foo)

Do:

parallel ... :::: foo

In most cases this will do what you want whereas the first may cause problems if it contains lines with spaces.

I assume that PatinN_files.txt has the same number of lines as PatinN_videos.txt.

Normally I would do 2 runs: a trace3-run and a trace5 run:

parallel ./FolderCounter {1} trace3 ";" ./VideoCounter {2} trace3  ::::+ PatinN_files.txt PatinN_videos.txt &> data_output/Result_PatinN.txt
parallel ./FolderCounter {1} trace5 ";" ./VideoCounter {2} trace5  ::::+ PatinS_files.txt PatinS_videos.txt &> data_output/Result_PatinS.txt

Alternatively you can simply use GNU Parallel to first generate all the commands to run and then run them (this does not require the txt-files to have the same number of lines):

(
 parallel --dry-run ./FolderCounter {} trace3 :::: PatinN_files.txt
 parallel --dry-run ./FolderCounter {} trace5 :::: PatinS_files.txt
 parallel --dry-run ./VideoCounter  {} trace3 :::: PatinN_videos.txt
 parallel --dry-run ./VideoCounter  {} trace5 :::: PatinS_videos.txt
) | parallel &> data_output/Result.txt

To track which input generates what output, use:

) | parallel --tag &> data_output/Result.txt

To get the log output into 4 different files is a bit harder. If that is really needed it can be done, but is not as elegant as the above.

If you simply want to run the jobs if there is spare cpus sitting idle, you can use --load 100%:

parallel --load 100% ./FolderCounter {} trace3 ::: $(cat PatinN_files.txt) &> data_output/Result_PatinN_files.txt &
parallel --load 100% ./FolderCounter {} trace5 ::: $(cat PatinS_files.txt) &> data_output/Result_PatinS_files.txt &
parallel --load 100% ./VideoCounter  {} trace3 ::: $(cat PatinN_videos.txt) &> data_output/Result_PatinN_video.txt &
parallel --load 100% ./VideoCounter  {} trace5 ::: $(cat PatinS_videos.txt) &> data_output/Result_PatinS_video.txt &
wait

It will start a job if the instant load is less than the number of cpus.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.