All my slurm jobs fail with exit code 0:53 within two seconds of starting.
When I look at job details with scontrol show jobid <JOBID> it doesn't say anything suspicious.
When I look at the files that stdout and stderr write to, there is nothing there.
I couldn't find anything on the listed signal 53.
It turns out that the directory containing the files that slurm was supposed to write stdout and stderr to didn't exist.
In my
submit.shscript, the relevant lines were:The
logdirectory in the current working directory from which I was submitting the job didn't exist. Once I created the directory slurm jobs no longer failed with0:53.My slurm version is
22.05.2. Per this answer, slurm no longer errors silently when the output directory doesn't exist from version23.02upwards. Seems to have been reported in this issue.