8

(I have searched and expected this question to have been asked before but couldn't find anything like this although there are plenty of similar questions)

I want this for-loop to run in 3 different threads/processes and wait seem to be the right command

for file in 1.txt 2.txt 3.text 4.txt 5.txt
        do something lengthy &
        i=$((i + 1))
        wait $!
done

But this construct, I guess, just starts one thread and then wait until it is done before it starts the next thread. I could place wait outside the loop but how do I then

  1. Access the pids?
  2. Limit it to 3 threads?
6
  • Do I understand correctly that you want five mutually independent tasks to be processed in three threads (with queuing as-it-happens) and the sole purpose of wait is to make sure that nothing else happens before all five have exited? Commented Apr 13, 2018 at 18:54
  • 3
    You don't necessarily have to give wait a PID. If you call wait with no arguments it will wait on all background processes, so putting the wait after done will wait for all threads to complete. Not sure how to limit to 3 threads though... Commented Apr 13, 2018 at 18:57
  • @Dario I have two functions, 1 and 2. 1 (the one above) can be paralleliliced but 2 can't be run until all 5 files are processed. I have 4 cores and I need to leave one alone so everything else can run uninterrupted. If I understand your question correctly the anser is "yes", Commented Apr 13, 2018 at 19:14
  • These are processes, not threads. Commented Apr 13, 2018 at 19:39
  • bash by itself isn't really suitable for maintaining a process pool like this. Commented Apr 13, 2018 at 19:54

4 Answers 4

4

The jobs builtin can list the currently running background jobs, so you can use that to limit how many you create. To limit your jobs to three, try something like this:

for file in 1.txt 2.txt 3.txt 4.txt 5.txt; do
  if [ $(jobs -r | wc -l) -ge 3 ]; then
    wait $(jobs -r -p | head -1)
  fi

  # Start a slow background job here:
  (echo Begin processing $file; sleep 10; echo Done with $file)&
done
wait # wait for the last jobs to finish
Sign up to request clarification or add additional context in comments.

3 Comments

More than one job could complete by the time the job you choose to wait on completes. This isn't a good way to keep your process pool busy.
(wait -n, introduced in bash 4.3, is an improvement, in that you only have to block until an arbitrary process completes, but that doesn't mean that only one process has completed, and jobs can continue to complete while you are deciding how many new processes you can start.)
True, although more importantly the job that we're waiting on may actually be the last of the three to finish -- who knows -- so it's not optimal. As you say in the question comments, bash on its own isn't really suitable for managing concurrency. However, given the bash primitives, this is a relatively simple way to avoid going over the process limit, even though it may underutilize the pool.
3

The GNU Parallel might be worth a look.

My first attempt,

parallel -j 3 'bash -c "sleep {};   echo {};"' ::: 4 1 2 5 3

can be, according to the inventor of parallel, be shortened to

parallel -j3 sleep {}\; echo {} ::: 4 1 2 5 3
1
2
4
3
5

and masking the semicolon, more friendly to type, like this:

parallel -j3 sleep {}";" echo {} ::: 4 1 2 5 3

works too.

It doesn't look trivial and I only tested it 2 times so far, once to answer this question. parallel --help shows a source where there is more info, the man page is a little bit shocking. :)

parallel -j 3 "something lengthy {}" ::: {1..5}.txt

might work, depending on something lengthy being a program (fine) or just bashcode (afaik, you can't just call a bash function in parallel with parallel).

On xUbuntu-Linux 16.04, parallel wasn't installed but in the repo.

2 Comments

First example shorter: parallel -j3 sleep {}\; echo {} ::: 4 1 2 5 2
@OleTange: Hi Ole, and thanks for parallel. Seen 3 or 4 of your videos so far, and tutorial is open in one of the 40 tabs, waiting for me to have some more time.
1

Building on Rob Davis' answer:

#!/bin/bash
qty=3

for file in 1.txt 2.txt 3.txt 4.txt 5.txt; do
    while [ `jobs -r | wc -l` -ge $qty ]; do
        sleep 1
        # jobs #(if you want an update every second on what is running)
    done
    echo -n "Begin processing $file"
    something_lengthy  $file &
    echo $!
done
wait

Comments

0

You can use a subshell approach example

 ( (sleep 10) &
    p1=$!
    (sleep 20) &
    p2=$!
    (sleep 15) &
    p3=$!
    wait
    echo "all finished ..." )

Note wait call wait for all child inside a subshell, you can use modulo operator (%) with 3 and use the reminder to check for 1st 2nd and 3rd process id (if needed) or can use it to run 3 parallel thread. Hope this helps.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.