Simultaneous starting -hold_jid jobs on Sun Grid Engine

876 views Asked by At

How do I start a bunch of SGE (Sun Grid Engine) jobs where some use -hold_jid option, but without requiring clever sorting of the order of qsub submissions.

If I do this everthing is fine, where job2 waits for job1 to finish:

qsub                   job1.sh
qsub -hold_jid job1.sh job2.sh             # OK: job2 waits for job1

However if I instead submit in a different order as shown below, job2 wrongly starts without waiting for job1. Presumably because SGE sees there are no job1 to wait for since job1 has not yet been submitted.

qsub -hold_jid job1.sh job2.sh   
qsub                   job1.sh             # BAD: job2 does not wait for job1

I have tried user hold option -h and then releasing the user hold with qalter, but releasing user hold seem to also release the -hold_jid hold:

qsub -h -hold_jid job1.sh job2.sh
qsub -h                   job1.sh
qalter -h U job*.sh                        # BAD: job2 does not wait for job1

Building a dependency tree and start to submit jobs from the leaf level would solve my problem. However I would like to avoid this. I am using Sun Grid Engine 6.2u3 on RHEL 6.

1

There are 1 answers

1
Julien V On

If @Vince is right and SGE really waits for jobs that were not launched yet, the only way to reach your goal would be to add a unique part to all your job names.

Using jobid won't work if jobs are not launched in the right order because you can't guess the jobid of a future job. If job1 was not launched yet, job2 qsub execution could not guess what will be job1 jobid to wait for...

For example :

uniqueID=`date "+%Y-%m-%d_%H-%M-%S"`
qsub -N "job2_$uniqueID" -hold_jid "job1_$uniqueID" job2.sh
qsub -N "job1_$uniqueID" job1.sh

This way, whatever the job launching order, job2 will wait job1.