I have a script running condor_submit
for a batch of 25 jobs, condor_wait
for them all to complete and then another condor_submit
for another batch pf 25 jobs.
I want to make sure non of the first 25 jobs failed with Normal termination (return value 127)
(any non-zero return value).
How can I easily do this? Or if that's impossible I'm also willing to wrap my job executable in a script that will fail them in case they return non-zero - but I'm not sure how to fail a HTCondor job!
You can use condor_history http://research.cs.wisc.edu/htcondor/manual/current/condor_history.html
If you run the following command:
It will return a space separated list of
JobId ExitStatus
It also supports other options other than just passing USERNAME.