Goal
Execute long running command remotely with the following capabilities:
- We can kill the task with its PID
- If task completes, its exit code should be written to file
- Task should be detached from launching ssh session
- (Stretch goal) STDERR and STDOUT should use line-buffering as they are written to file
For my use case, multiple instances of this task will be launched concurrently on different servers. However, all of the servers (including the launch server) will be using the same working directory (it's mounted at the same location on all servers).
Issue
As of right now I am able to meet the first 3 requirements. The task is a c++ program named main and we can assume the relative path from $HOME to our working directory is working_directory_path. The current solution uses two helper scripts (run.sh and helper.sh) to achieve its goals.
First, the ssh command:
$ ssh {server} "cd {wdpath}; ./run.sh"
Then we use run.sh to detach from the ssh session and write the PID to a file.
#!/bin/bash
nohup ./helper.sh &>/dev/null &
echo $! > process.pid
Next, we use helper.sh to record the exit status of our program.
#!/bin/bash
./main > process.stdout 2> process.stderr
echo $? > process.status
This seems to work, but there is a few things I don't like.
- I have to create 2 temporary scripts per remote invocation, as each invocation of
mainwill be fed some command line arguments that differ (and I will need to adjust the output file names). - STDOUT and STDERR are being block buffered due to
nohup. - The PID I'm recording doesn't actually belong to the
mainprocess, but its parent shell.
Is it possible to improve my solution so that some of these issues are resolved?
fork()+setsid()for example.nohupsometimes redirects the standard streams of the commands it runs, but it does not manipulate their buffering, and in this case it won't do anything with them at all because you already redirect to/dev/null, which is not a terminal. What's more, that's one layer removed from execution of your./mainanyway. Your./main's STDOUT is fully buffered because it's connected to a regular file. On the other hand, your STDERR is not fully buffered (it's probably unbuffered) unless./mainis manipulating that itself or something is not happening according to spec.setvbuf()can be used to adjust the buffering of the standard streams from within your program. For invocation from C++, I guess you wouldinclude <cstdio>, and call it asstd::setvbuf