Why does running "pkill -f <anything>" over ssh fail only when branching on its result?

1.1k views Asked by At

Found an interesting interaction between pkill and ssh. Documenting it here for posterity:

$ ssh user@remote 'false'; echo $?                                                              
1

$ ssh user@remote 'false || echo "failed"'; echo $?
failed
0

$ ssh user@remote 'pkill -f "fake_process"'; echo $?                                               
1

$ ssh user@remote 'pkill -f "fake_process" || echo "failed"'; echo $?
255

It seems like example #4 should have the same output as #2; both false and pkill -f "fake_process" exit with code 1 and have no output. However, #4 will always exit with code 255, even if the remote command explicitly calls exit 0. The docs for ssh state that code 255 just means "an error occurred" (super helpful).

Replacing the pkill command with (exit 1), ls fake_file, kill <non-existent PID>, etc. all work as expected. Additionally, when running locally (not through ssh), these match as expected.

1

There are 1 answers

2
0x5453 On BEST ANSWER

The problem appears to be that pkill is killing itself. Or rather, it is killing the shell that owns it.

First of all, it appears that ssh uses the remote user's shell to execute certain "complicated" commands:

$ ssh user@remote 'ps -F --pid $$'
UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
user      9531  9526  0 11862  1616   6 14:36 ?        00:00:00 ps -F --pid 9531

$ ssh user@remote 'ps -F --pid $$ && echo hi'
UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
user      9581  9577  0 28316  1588   5 14:36 ?        00:00:00 bash -c ps -F --pid $$ && echo hi
hi

Second, it appears that pkill -f normally knows not to kill itself (otherwise all pkill -f commands would suicide). But if run from a subshell, that logic fails:

$ pkill -f fake_process; echo $?
1

$ sh -c 'pkill -f fake_process'; echo $?
[1]    14031 terminated  sh -c 'pkill -f fake_process'
143

In my case, to fix this I just re-worked some of the code around my ssh/pkill so that I could avoid having a "complicated" remote command. Theoretically I think you could also do something like pgrep -f <cmd> | grep -v $$ | xargs kill.