How do I start a bunch of SGE (Sun Grid Engine) jobs where some use -hold_jid
option, but without requiring clever sorting of the order of qsub
submissions.
If I do this everthing is fine, where job2 waits for job1 to finish:
qsub job1.sh
qsub -hold_jid job1.sh job2.sh # OK: job2 waits for job1
However if I instead submit in a different order as shown below, job2 wrongly starts without waiting for job1. Presumably because SGE sees there are no job1 to wait for since job1 has not yet been submitted.
qsub -hold_jid job1.sh job2.sh
qsub job1.sh # BAD: job2 does not wait for job1
I have tried user hold option -h
and then releasing the user hold with qalter
, but releasing user hold seem to also release the -hold_jid
hold:
qsub -h -hold_jid job1.sh job2.sh
qsub -h job1.sh
qalter -h U job*.sh # BAD: job2 does not wait for job1
Building a dependency tree and start to submit jobs from the leaf level would solve my problem. However I would like to avoid this. I am using Sun Grid Engine 6.2u3 on RHEL 6.
If @Vince is right and SGE really waits for jobs that were not launched yet, the only way to reach your goal would be to add a unique part to all your job names.
Using jobid won't work if jobs are not launched in the right order because you can't guess the jobid of a future job. If job1 was not launched yet, job2 qsub execution could not guess what will be job1 jobid to wait for...
For example :
This way, whatever the job launching order, job2 will wait job1.