xargs output buffering -P parallel

Question

I have a bash function that i call in parallel using xargs -P like so

 echo ${list} | xargs -n 1 -P 24 -I@ bash -l -c 'myAwesomeShellFunction @'

Everything works fine but output is messed up for obvious reasons (no buffering)

Trying to figure out a way to buffer output effectively. I was thinking I could use awk, but I'm not good enough to write such a script and I can't find anything worthwhile on google? Can someone help me write this "output buffer" in sed or awk? Nothing fancy, just accumulate output and spit it out after process terminates. I don't care the order that shell functions execute, just need their output buffered... Something like:

 echo ${list} | xargs -n 1 -P 24 -I@ bash -l -c 'myAwesomeShellFunction @ | sed -u ""'

P.s. I tried to use stdbuf as per https://unix.stackexchange.com/questions/25372/turn-off-buffering-in-pipe but did not work, i specified buffering on o and e but output still unbuffered:

 echo ${list} | xargs -n 1 -P 24 -I@ stdbuf -i0 -oL -eL bash -l -c 'myAwesomeShellFunction @'

Here's my first attempt, this only captures first line of output:

 $ bash -c "echo stuff;sleep 3; echo more stuff" | awk '{while (( getline line) > 0 )print "got ",$line;}'
 $ got  stuff

Don't tell me to use GNU parallel, I don't have it installed and getting it installed is questionable as I'm running on RHEL6 — niken
– niken, Commented Jun 15, 2017 at 14:01
Why not redirect outputs to a file, using the -I@ to generate a unique filename for each task? You can then cat all the files. You can even use mktemp — salezica
– salezica, Commented Jun 15, 2017 at 14:32
BTW, -I@ is problematic being substituted into bash -c if you don't trust your data. If one of your list entries contains $(/tmp/evil-prog), then you just had code injected. Much much safer to pass data out-of-band from code. — Charles Duffy
– Charles Duffy, Commented Jun 15, 2017 at 14:36
@BenjaminW., turning on line buffering tells an individual instance not to try to perform writes shorter than a single line (assuming you don't have a single line that goes over your buffer size), but it doesn't stop your lines from being interleaved across multiple instances. Even larger-than-line buffers don't prevent the boundaries of the buffer (and thus the boundaries of a write) from being at an undesirable cutoff point. — Charles Duffy
– Charles Duffy, Commented Oct 31, 2017 at 18:21

Charles Duffy · Accepted Answer · 2017-06-15 15:14:52Z

9

This isn't quite atomic if your output is longer than a page (4kb typically), but for most cases it'll do:

xargs -P 24 bash -c 'for arg; do printf "%s\n" "$(myAwesomeShellFunction "$arg")"; done' _

The magic here is the command substitution: $(...) creates a subshell (a fork()ed-off copy of your shell), runs the code ... in it, and then reads that in to be substituted into the relevant position in the outer script.

Note that we don't need -n 1 (if you're dealing with a large number of arguments -- for a small number it may improve parallelization), since we're iterating over as many arguments as each of your 24 parallel bash instances is passed.

If you want to make it truly atomic, you can do that with a lockfile:

# generate a lockfile, arrange for it to be deleted when this shell exits
lockfile=$(mktemp -t lock.XXXXXX); export lockfile
trap 'rm -f "$lockfile"' 0

xargs -P 24 bash -c '
  for arg; do
    {
      output=$(myAwesomeShellFunction "$arg")
      flock -x 99
      printf "%s\n" "$output"
    } 99>"$lockfile"
  done
' _

edited Jun 15, 2017 at 15:14

answered Jun 15, 2017 at 14:35

Charles Duffy

299k43 gold badges441 silver badges496 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

niken Over a year ago

That's really clever, didn't realize command substitution did that. Output can only occupy 1 page? Mine is pretty much guaranteed to be less than 4kb, unless something really unusual happens. Number of arguments is typically <24, this is upper limit, n1 was there just for safety really... This worked, so awesome!!!

Charles Duffy Over a year ago

It can occupy multiple pages, but has more of a chance of being split into multiple writes in that case, so if you had multiple processes finishing at the same time (within a few milliseconds of each other?) combined with multi-page output, then you might lose the atomicity.

rustyx Over a year ago

When the function outputs nothing the script still prints a newline. But if I remove \n from printf then the output from multiple scripts is glued together on the same line. How to avoid this while not printing a newline for empty output?

Charles Duffy Over a year ago

@rustyx, if you're using GNU xargs, see -r / --no-run-if-empty if the problem is no arguments. If the problem is the shell loop continuing when myAwesomeShellFunction has no output, then you can use [[ $output ]] || continue or an equivalent (such as [[ $output ]] && { flock -x 99 && printf '%s\n' "$output"; })

Collectives™ on Stack Overflow

xargs output buffering -P parallel

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related