Run multiple commands in parallel, and return whenever one of them fails or all of them succeed

Question

I've seen the following question : Bash run two commands and get output from both which almost responds to my need.

However, the wait command is blocking, so that means that if command 2 fails before command 1 succeeds, the command will not return when command 2 fails but only when command 1 succeeds.

Is it possible to run multiple commands in parallel, and return 1 whenever one of them fails and return 0 if all of them succeed (And returning as soon as possible) ?

It would be even better if that is possible using standard commands (like xargs or parallel), but also ok if it is written using bash.

If one of the command fails, do you want to kill the other before returning or keep the other running? — Alepac
– Alepac, Commented Aug 27, 2015 at 12:07
Have you looked at this thread ? It simulates the behaviour of a barier in bash — Aserre
– Aserre, Commented Aug 27, 2015 at 12:12
@Ploutox That's a good start but it crucially lacks the facility to exit early if one of the subprocesses fails. (And, it's trivial.) — tripleee
– tripleee, Commented Aug 27, 2015 at 12:14

Alepac · Accepted Answer · 2015-08-27 14:29:40Z

This code gives the right exit code, and kills the survivor process:

#/bin/bash

# trap for SIGTERM and set RET_VALUE to false
trap "RET_VAL=false" SIGTERM

MY_PID=$$
# Initialize RET_VALUE to true
RET_VAL=true

# This function will executed be in a separate job (see below)
thread_listener() {
    # Starts the long time job 
    ./longJob.sh &
    PID=$!
    # trap for sigterm and kill the long time process
    trap "kill $PID" SIGTERM
    echo waiting for $PID
    echo Parent $MY_PID
    # Send a SIGTERM to parent job in case of failure
    wait $PID || kill $MY_PID
    exit
}

echo $MY_PID

# Runs thread listener in a separate job
thread_listener &
PID1=$!

# Runs thread listener in a separate job
thread_listener &
PID2=$!

wait
# send sigterm to PID1 and PID2 if present
kill $PID1 2> /dev/null
kill $PID2 2> /dev/null
# returns RET_VALUE
$RET_VAL

See the comments for an explanation of the code. The trick is to starts jobs able to accept or send signal to parent job if needed.

The child job send a signal to the parent in case of a failure of its long time job and the parent send a signal to its childs after the wait (it the parent receive a signal the wait returns immediatly)

Ole Tange · Accepted Answer · 2015-08-28 05:49:59Z

0

The recent versions of GNU Parallel have focused on exactly that problem. Kill running children, if a single one fails:

parallel --halt now,fail=1 'echo {};{}' ::: true false true true false

Kill running children, if a single one succeeds:

parallel --halt now,success=1 'echo {};{}' ::: true false true true false

Kill running children, if 20% fails:

parallel -j1 --halt now,fail=20% 'echo {#} {};{}' ::: true true true false true true false true false true

Kill the child with signals TERM,TERM,TERM,KILL while waiting 50 ms between each signal:

parallel --termseq TERM,50,TERM,50,TERM,50,KILL -u --halt now,fail=1 'trap "echo TERM" SIGTERM; sleep 1;echo {};{}' ::: true false true true false

answered Aug 28, 2015 at 5:49

Ole Tange

34.1k9 gold badges93 silver badges111 bronze badges

3 Comments

edi9999 Over a year ago

How can you use commands longer than one word ? for example parallel --halt now,fail=1 'echo {};{}' ::: true "echo 'hi'" fails with /bin/bash: echo 'hi': command not found

edi9999 Over a year ago

It works if I use : parallel --halt now,fail=1 "echo {}; eval {}" ::: "echo 'ho'" "echo 'hi'"

Ole Tange Over a year ago

Yup. Either use eval or only use {}: parallel --halt now,fail=1 {} ::: "echo 'ho'" "echo 'hi'" Typically it will be used as: cat file | parallel --halt now,fail=1

lnstadrum · Accepted Answer · 2021-08-26 18:35:09Z

0

Solved a similar situation, share here if ever it helps to someone.

I have got three commands to run.

server1 and server2 are long running commands, e.g. webservers.
healthcheck command runs some checks to make sure that the servers are okay. It requires the two servers up to perform the tests.

Required behavior:

If healthcheck is successful (return code is 0), block until the servers are running. This is normal operation.
If healthcheck fails, return immediately. This is wanted if there is a problem with servers.

The following does the job for me:

server1 & server2 & healthcheck && wait

answered Aug 26, 2021 at 18:35

lnstadrum

5753 silver badges16 bronze badges

Collectives™ on Stack Overflow

Run multiple commands in parallel, and return whenever one of them fails or all of them succeed

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related