Found an interesting interaction between pkill
and ssh
. Documenting it here for posterity:
$ ssh user@remote 'false'; echo $?
1
$ ssh user@remote 'false || echo "failed"'; echo $?
failed
0
$ ssh user@remote 'pkill -f "fake_process"'; echo $?
1
$ ssh user@remote 'pkill -f "fake_process" || echo "failed"'; echo $?
255
It seems like example #4 should have the same output as #2; both false
and pkill -f "fake_process"
exit with code 1
and have no output. However, #4 will always exit with code 255
, even if the remote command explicitly calls exit 0
. The docs for ssh
state that code 255
just means "an error occurred" (super helpful).
Replacing the pkill
command with (exit 1)
, ls fake_file
, kill <non-existent PID>
, etc. all work as expected. Additionally, when running locally (not through ssh
), these match as expected.
The problem appears to be that
pkill
is killing itself. Or rather, it is killing the shell that owns it.First of all, it appears that
ssh
uses the remote user's shell to execute certain "complicated" commands:Second, it appears that
pkill -f
normally knows not to kill itself (otherwise allpkill -f
commands would suicide). But if run from a subshell, that logic fails:In my case, to fix this I just re-worked some of the code around my
ssh
/pkill
so that I could avoid having a "complicated" remote command. Theoretically I think you could also do something likepgrep -f <cmd> | grep -v $$ | xargs kill
.