3

I am trying to run a Python script with two inputs as follows. I got ~300 of these two inputs so I wonder if somebody could advise how to run them with parallel.

The single run looks like:

python stable.py KOG_1.fan KOG_1.fasta > KOG_1.stable

My test with parallel which is not working:

ls *.fan; ls *.fasta | parallel python stable.py {} {} > {.}.stable

but how do I specify that is has to run with _1.fan and _1.fasta; then _2.fan and _1.fasta and so on... until _300.fan and _300.fasta.

2
  • 1
    shouldn't it be _2.fasta the second time? Commented Feb 5, 2016 at 12:19
  • removed the useless phrases. They are deprecated on SO, please don't add them back. Commented Feb 5, 2016 at 12:19

3 Answers 3

2

This is not really a Python question, it's a question about GNU parallel. You could try this if all files are prefixed with "KOG_":

seq 1 300 | parallel python stable.py KOG_{}.fan KOG_{}.fasta ">" KOG_{.}.stable

The quotes around the redirect (">") are important, unless you want all of the output in one file.

To handle generic prefixes:

ls *fan *fasta | parallel ---max-lines=2 python stable.py {1} {2} ">" {1.}.stable

This uses the -max-lines option to take 2 lines per command. Of course this works only if the *.fan and *.fasta files match up, i.e. there must be the same number of each, and the numbers need to match up, otherwise you'll end up pairing files that shouldn't be paired. If that is a problem, you can figure out a command that will more robustly feed pairs to parallel.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you. I will amend the question. Yes it is a GNU-parallel question indeed. I am new with GNU but I have found it is very useful. I am using a mac so try to switch GNU based thins like GNU-sed.
Thank you so much for the one liner it works perfect by using seq 1 300. Luckily all my files have the same prefix KOG_ and also are paired. You have give to my day a brilliant start.
1

Try:

parallel python stable.py {} {.}.fasta '>' {.}.stable ::: *fan

Comments

0

I recommend you split this task in two steps:

  1. Create a jobs file containing all commands you want to run with parallel. You need to create a text file jobs.txt that should be similar to the one presented bellow:

    python stable.py KOG_1.fan KOG_1.fasta > KOG_1.stable
    python stable.py KOG_2.fan KOG_2.fasta > KOG_2.stable
    python stable.py KOG_3.fan KOG_3.fasta > KOG_3.stable
    python stable.py KOG_4.fan KOG_4.fasta > KOG_4.stable
    ...
    python stable.py KOG_300.fan KOG_300.fasta > KOG_300.stable
    

    If all your files are prefixed with KOG, you can build up this file this way:

    for I in `seq 300`; do echo "python stable.py KOG_$I.fan KOG_$I.fasta > KOG_$I.stable" >> jobs.txt; done;
    
  2. Run parallel using the jobs file

    Once you have the jobs file, you just need to run the following command:

    parallel -j4 < jobs.txt    
    

    Note that -j4 indicates that at most 4 commands from your jobs file will be running in parallel. You can adjust that according to the number of cores available on your computer.

1 Comment

Thank you for your response. I tried option number 1 and it works very well!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.