How to set slurm/salloc for 1 gpu per task but let job use multiple gpus?

7.7k views Asked by At

We are looking for some advice with slurm salloc gpu allocations. Currently, given:

% salloc -n 4 -c 2 -gres=gpu:1
% srun env | grep CUDA   
CUDA_VISIBLE_DEVICES=0
CUDA_VISIBLE_DEVICES=0
CUDA_VISIBLE_DEVICES=0
CUDA_VISIBLE_DEVICES=0

However, we desire more than just device 0 to be used.
Is there a way to specify an salloc with srun/mpirun to get the following?

CUDA_VISIBLE_DEVICES=0
CUDA_VISIBLE_DEVICES=1
CUDA_VISIBLE_DEVICES=2
CUDA_VISIBLE_DEVICES=3

This is desired such that each task gets 1 gpu, but overall gpu usage is spread out among the 4 available devices (see gres.conf below). Not where all tasks get device=0.

That way each task is not waiting on device 0 to free up from other tasks, as is currently the case.

Or is this expected behavior even if we have more than 1 gpu available/free (4 total) for the 4 tasks? What are we missing or misunderstanding?

  • salloc / srun parameter?
  • slurm.conf or gres.conf setting?

Summary We want to be able to use slurm and mpi such that each rank/task uses 1 gpu, but the job can spread tasks/ranks among the 4 gpus. Currently it appears we are limited to device 0 only. We also want to avoid multiple srun submissions within an salloc/sbatch due to mpi usage.

OS: CentOS 7

Slurm version: 16.05.6

Are we forced to use wrapper based methods for this?

Are there differences with slurm version (14 to 16) in how gpus are allocated?

Thank you!

Reference: gres.conf

Name=gpu File=/dev/nvidia0
Name=gpu File=/dev/nvidia1
Name=gpu File=/dev/nvidia2
Name=gpu File=/dev/nvidia3
3

There are 3 answers

2
damienfrancois On

First of all, try requesting four GPUs with

% salloc -n 4 -c 2 -gres=gpu:4

With --gres=gpu:1, it is the expected behaviour that all tasks see only one GPU. With --gres=gpu:4, the output would be

CUDA_VISIBLE_DEVICES=0,1,2,3
CUDA_VISIBLE_DEVICES=0,1,2,3
CUDA_VISIBLE_DEVICES=0,1,2,3
CUDA_VISIBLE_DEVICES=0,1,2,3

To get what you want, you can use a wrapper script, or modify your srun command like this:

srun bash -c 'CUDA_VISIBLE_DEVICES=$SLURM_PROCID env' | grep CUDA

then you will get

CUDA_VISIBLE_DEVICES=0
CUDA_VISIBLE_DEVICES=1
CUDA_VISIBLE_DEVICES=2
CUDA_VISIBLE_DEVICES=3
0
Jehandad On

To accomplish one GPU per task you need to use the --gpu-bind switch of the srun command. For example, if I have three nodes with 8 GPUs each and I wish to run eight tasks per node each bound to a unique GPU, the following command would do the trick:

srun -p gfx908_120 -n 24 -G gfx908_120:24 --gpu-bind=single:1  -l bash -c 'echo $(hostname):$ROCR_VISIBLE_DEVICES'
0
TexasDex On

This feature is planned for 19.05. See https://bugs.schedmd.com/show_bug.cgi?id=4979 for details.

Be warned that the 'srun bash...' solution suggested will break if your job doesn't request all GPUs on that node, because another process may be in control of GPU0.