10

I have a bash function that i call in parallel using xargs -P like so

 echo ${list} | xargs -n 1 -P 24 -I@ bash -l -c 'myAwesomeShellFunction @'

Everything works fine but output is messed up for obvious reasons (no buffering)

Trying to figure out a way to buffer output effectively. I was thinking I could use awk, but I'm not good enough to write such a script and I can't find anything worthwhile on google? Can someone help me write this "output buffer" in sed or awk? Nothing fancy, just accumulate output and spit it out after process terminates. I don't care the order that shell functions execute, just need their output buffered... Something like:

 echo ${list} | xargs -n 1 -P 24 -I@ bash -l -c 'myAwesomeShellFunction @ | sed -u ""'

P.s. I tried to use stdbuf as per https://unix.stackexchange.com/questions/25372/turn-off-buffering-in-pipe but did not work, i specified buffering on o and e but output still unbuffered:

 echo ${list} | xargs -n 1 -P 24 -I@ stdbuf -i0 -oL -eL bash -l -c 'myAwesomeShellFunction @'

Here's my first attempt, this only captures first line of output:

 $ bash -c "echo stuff;sleep 3; echo more stuff" | awk '{while (( getline line) > 0 )print "got ",$line;}'
 $ got  stuff
7
  • Don't tell me to use GNU parallel, I don't have it installed and getting it installed is questionable as I'm running on RHEL6 Commented Jun 15, 2017 at 14:01
  • Why not redirect outputs to a file, using the -I@ to generate a unique filename for each task? You can then cat all the files. You can even use mktemp Commented Jun 15, 2017 at 14:32
  • Don't wanna mess around with files Commented Jun 15, 2017 at 14:36
  • 2
    BTW, -I@ is problematic being substituted into bash -c if you don't trust your data. If one of your list entries contains $(/tmp/evil-prog), then you just had code injected. Much much safer to pass data out-of-band from code. Commented Jun 15, 2017 at 14:36
  • 1
    @BenjaminW., turning on line buffering tells an individual instance not to try to perform writes shorter than a single line (assuming you don't have a single line that goes over your buffer size), but it doesn't stop your lines from being interleaved across multiple instances. Even larger-than-line buffers don't prevent the boundaries of the buffer (and thus the boundaries of a write) from being at an undesirable cutoff point. Commented Oct 31, 2017 at 18:21

1 Answer 1

9

This isn't quite atomic if your output is longer than a page (4kb typically), but for most cases it'll do:

xargs -P 24 bash -c 'for arg; do printf "%s\n" "$(myAwesomeShellFunction "$arg")"; done' _

The magic here is the command substitution: $(...) creates a subshell (a fork()ed-off copy of your shell), runs the code ... in it, and then reads that in to be substituted into the relevant position in the outer script.

Note that we don't need -n 1 (if you're dealing with a large number of arguments -- for a small number it may improve parallelization), since we're iterating over as many arguments as each of your 24 parallel bash instances is passed.


If you want to make it truly atomic, you can do that with a lockfile:

# generate a lockfile, arrange for it to be deleted when this shell exits
lockfile=$(mktemp -t lock.XXXXXX); export lockfile
trap 'rm -f "$lockfile"' 0

xargs -P 24 bash -c '
  for arg; do
    {
      output=$(myAwesomeShellFunction "$arg")
      flock -x 99
      printf "%s\n" "$output"
    } 99>"$lockfile"
  done
' _
Sign up to request clarification or add additional context in comments.

4 Comments

That's really clever, didn't realize command substitution did that. Output can only occupy 1 page? Mine is pretty much guaranteed to be less than 4kb, unless something really unusual happens. Number of arguments is typically <24, this is upper limit, n1 was there just for safety really... This worked, so awesome!!!
It can occupy multiple pages, but has more of a chance of being split into multiple writes in that case, so if you had multiple processes finishing at the same time (within a few milliseconds of each other?) combined with multi-page output, then you might lose the atomicity.
When the function outputs nothing the script still prints a newline. But if I remove \n from printf then the output from multiple scripts is glued together on the same line. How to avoid this while not printing a newline for empty output?
@rustyx, if you're using GNU xargs, see -r / --no-run-if-empty if the problem is no arguments. If the problem is the shell loop continuing when myAwesomeShellFunction has no output, then you can use [[ $output ]] || continue or an equivalent (such as [[ $output ]] && { flock -x 99 && printf '%s\n' "$output"; })

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.