SGE / UGE suspend running jobs

4.6k views Asked by At

I'm aware, one can suspend running jobs by qmod -sj [jobid] command and in principal that works. Which means the jobs go to suspend (s) state -- fine so far, but:

I expected that if I put all running jobs to suspend state and qsub new ones to GE or have waiting jobs, that these get to be run, which is not the case.

Some search on this topic lead me to http://gridengine.org/pipermail/users/2011-February/000050.html, which in fact points to the direction, that suspended jobs make the GE free for running other ones.

1

There are 1 answers

0
Taylor Hamilton On

See here.:

In a workload manager with "built-in" preemption, like Platform LSF, it works by temporarily relaxing the slot count limit on a node and then resolving the oversubscription by bumping the lowest job on the totem pole to get the number of jobs back under the slot count limit. In Sun Grid Engine, the same thing happens, except that instead of the scheduler temporarily relaxing the slot count limits, you as the administrator configure the host with more slots than you need and a set of rules that create an artificial lower limit on the job count that is enforced by bumping the lowest priority jobs.

Slightly different topic, but it seems the principal can hold the same: to run other jobs while maintaining your suspended ones, temporarily increase the slot counts on the relevant nodes.