Pytorch: Send same batch of data to multiple GPUs, and perform ops on each GPU individually

922 views Asked by At

I have the same dataloader to feed data to 4 models, each with a different hyperparameter loaded on a separate GPU. I want to reduce the bottleneck caused by data-loading, so I intend to load the same batch prepared by the dataloader on all GPUs for them to compute individually and perform a backprop step. I already cache data into RAM to avoid disk-bottlenecks when the dataloader in instantiated.

I am trying to:

  1. Send/Broadcast the same batch of data to N GPUs. I guess this is possible only if we can sync/wait for all GPUs to finish ops for one batch, before we can proceed to the next one.
  2. Bonus: Prefetching next batch as soon as one batch is ready (upto P batches) could help ensure continuous flow of data to the GPUs avoiding the wait.

I am not trying to achieve:

  1. Data Parallelism - Split a large batch into N parts, and compute each part on one GPU
  2. Model Parallelism - Split computation of a large model (that won't fit on one GPU) into N (or less) parts and place each part on one GPU.

Similar questions:

  1. This one is about making a Conv2D operation span across multiple GPUs
  2. This one is about executing different GPU computations in parallel, but I don't know if my problem can be solved with torch.cuda.Stream()
  3. This one is about loading different models, but it does not deal with sharing the same batch.
  4. This one is exactly about what I'm asking, but it's CUDA/PCIe and from 7 years ago.

Update:

I found a very similar question in Pytorch discuss where there is a small example at the end using forward prop using multiprocessing, but I'm wondering how to scale this approach to torch dataloaders.

0

There are 0 answers