All my slurm jobs fail with exit code 0:53
within two seconds of starting.
When I look at job details with scontrol show jobid <JOBID>
it doesn't say anything suspicious.
When I look at the files that stdout
and stderr
write to, there is nothing there.
I couldn't find anything on the listed signal 53
.
It turns out that the directory containing the files that slurm was supposed to write stdout and stderr to didn't exist.
In my
submit.sh
script, the relevant lines were:The
log
directory in the current working directory from which I was submitting the job didn't exist. Once I created the directory slurm jobs no longer failed with0:53
.My slurm version is
22.05.2
. Per this answer, slurm no longer errors silently when the output directory doesn't exist from version23.02
upwards. Seems to have been reported in this issue.