AWS Batch, how to ask for GPUs

1k views Asked by At

I'm getting documented on how to use AWS batch to train deep learning models. The idea is that, once a model is built, I'd like to submit several jobs to explore a bit the hyperparameter space.

In this interesting blog post, the blogger created an execution environment of P2 instances and used it to train a convolutional neural network for MNIST. I am now wondering if it's possible to require a specific number of GPUs instead of vCPUs in my job definition. In this way I'm sure that my job has the number of GPUs it needs. If not, is there any workaround?

2

There are 2 answers

0
transpattern On BEST ANSWER

AWS Batch start to support GPU allocation/scheduling since April 2019. With this new feature, you can specify the number of GPU your job needs. Batch also does GPU pinning for your jobs. If a instance has multiple GPUs, Batch can place multiple jobs (each job asks for 1 GPU) on the same instance and having them run concurrently. Here is a example to run gpu jobs with Batch gpu support. https://aws.amazon.com/blogs/compute/gpu-workloads-on-aws-batch/

0
Sean Miller On

I'm sure you've figured it out by now, but can't hurt, right? No, as of right now there's no way to specify GPU count. You can, however, allocate vCPU count to job definitions to specify that many GPUs.

For example, the p2.xlarge instances have 4 vCPUs. So if you want your job to have 1 GPU assigned to it, then assign that job definition 4 vCPUs. That way each p2.xlarge instance will only ever have one job running on it. It's probably overkill on needed vCPU space, but it's the only way right now to specify that you want that job and that job only to have a GPU.

I've talked to guys at AWS, and they keep saying that GPU specification might be coming soon in the future, but who knows, really.