Forwarding signals in bash script which is submitted on the cluster

1.6k views Asked by At

I have a launch.sh script which I submit on the cluster with

bsub $settings < launch.sh

This launch.sh bash script looks simplified as the following:

function trap_with_arg() {
    func="$1" ; shift
    for sig ; do
        echo "$ES Installing trap for signal $sig"
        trap "$func $sig" "$sig"
    done
}
function signalHandler() {
    # do stuff depending in what stage the script is
}

# Setup the Trap
trap_with_arg signalHandler SIGINT SIGTERM SIGUSR1 SIGUSR2 

./start.sh
mpirun process.sh
./end.sh

Where process.sh calls two binaries (as an example) as

./binaryA 
./binaryB

My question is the following: The cluster already sends SIGUSR1 (approx. 10min before SIGTERM) to the process (I think this is the bash shell running my launch.sh script).

At the moment I catch this signal in the launch.sh script and call some signal handler. The problem is, this signal handler only gets executed (at least what I know) after a running command is finished (e.g. that might be mpirun process.sh or ./start.sh )

How can I forward these signals to make the commands/binaries exit gracefully. Forwarding for example to process.sh (mpirun, as I experienced, already forwards somehow these received signals (how does it do that?) What is the proper way of forwarding signals, (e.g. also to the binaries binaryA, binaryB ? I have no really good clue how to do this? Making the commands execute in background, creating a child process?

Thanks for some enlightenment :-)

1

There are 1 answers

0
pasaba por aqui On

From bash manual at http://www.gnu.org/software/bash/manual/html_node/Signals.html:

If Bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap will not be executed until the command completes. When Bash is waiting for an asynchronous command via the wait builtin, the reception of a signal for which a trap has been set will cause the wait builtin to return immediately with an exit status greater than 128, immediately after which the trap is executed.

Thus, the solution seems to place commands in background and use "wait":

something &
wait