How do I schedule a job on multiple nodes with qsub Univa 8.1.7?

1.6k views Asked by At

I want to be able to schedule a job on multiple nodes, with one process per node. I also want that process to use threads that use all of the cores available on that node. I know that "ppn" is used for scheduling PBS jobs, so I tried it with the Univa scheduler. The colon delimiter doesn't work, so I used two '-l' flags. I attempted

qsub -cwd -j y -l nodes=4 -l ppn=1 -N hellonodes mpirunscript.sh

This gives

Unable to run job: unknown resource "ppn".

Exiting.

In the man page of qsub it states

complex(5) describes how a list of available resources and their associated valid value specifiers can be obtained.

Unfortunately no such documentation exists on the cluster I am using. However, I found one here. Eventually I discovered that to get the list of settable resources values, I needed to run

qconf - sc

This output the below (abbreviated):

#name               shortcut   type        relop   requestable consumable default  urgency 
#------------------------------------------------------------------------------------------
...
cpu                 cpu        DOUBLE      >=      YES         NO         0        0
...
m_numa_nodes        nodes      INT         <=      YES         NO         0        0
m_socket            socket     INT         <=      YES         NO         0        0
m_thread            thread     INT         <=      YES         NO         0        0
...
num_proc            p          INT         ==      YES         NO         0        0
...
slots               s          INT         <=      YES         YES        1        1000
...

"ppn" (processes per node for PBS) was not listed, nor was anything similar that I could find. Can anyone tell me if this is possible, and if so, how?

1

There are 1 answers

0
Daniel On BEST ANSWER

Since it is a parallel job you need to request a parallel environment with -pe The admin has to create a parallel environment which fulfill your requirements first. It is then persistent and can be used for this type of parallel jobs. See: http://www.gridengine.eu/mangridengine/htmlman5/sge_pe.html

For creating a parallel environment: qconf -ap mype For listing all PEs: qconf -spl Then attach the PE to your queue: qconf -mq all.q (in case of all.q) --> "pe_list mype"

Important is: allocation_rule Here you need to set: 1 --> This means one process per compute host.

Set slots to an high value (like the amount of cores in your cluster). It is a limitation for all jobs using this parallel environment.

Then you or your users can start your job: qsub -pe mytpe 8 myscript.sh

Then you get 8 compute nodes for this job with 1 slot each. qstat -g t shows you where.

Does this help?

Daniel