Restart job in Condor after certain amount time

763 views Asked by At

I am running jobs on Condor and have noticed that for some reason a subset of my jobs will run but never complete. Is there a setting in the submit file that kills and then resubmits a job if it takes over a certain amount of time to complete? This is similar to the question Condor Timeout for idle jobs except I want Condor not to simply kill the jobs, but resubmit them as well.

Thanks!

1

There are 1 answers

0
SCa On

you can use the KILL transition expression in the machine class add file (Condor user manual). Something like:

START = True
...
+MaxJobExecutionTime = xxx #seconds
KILL            = $(ActivityTimer) > MaxJobExecutionTime

Like this the machine will kill jobs that take more than MaxExecutionTime. Condor will then retry the job.