Out of memory when using multiple GPUs with larger batch_size in Caffe

668 views Asked by At

I am now fine-tuning on VGG-Face (very big model) with 8 TITAN Xp GPUs available. However, Caffe gives out-of-memory error when I increase batch_size. Here is what I did:

First, batch_size was set to 40 for training stage and it works fine on a single GPU. The chosen GPU was nearly 100% utilized. Then, I increased batch_size to 128 with all the 8 GPUs using

'./build/tools/caffe train -solver mysolver.prototxt -gpu all'

All the GPUs were fully utilized, as is shown in nvidia-smi.jpg

And Caffe gives me the following error:

F0906 03:41:32.776806 95655 parallel.cpp:90] Check failed: error ==cudaSuccess (2 vs. 0)  out of memory
*** Check failure stack trace: ***
@     0x7f9a0832995d  google::LogMessage::Fail()
@     0x7f9a0832b6e0  google::LogMessage::SendToLog()
@     0x7f9a08329543  google::LogMessage::Flush()
@     0x7f9a0832c0ae  google::LogMessageFatal::~LogMessageFatal()
@     0x7f9a08abe825  caffe::GPUParams<>::GPUParams()
@     0x7f9a08abefd8  caffe::NCCL<>::NCCL()
@           0x40dc69  train()
@           0x40a8ed  main
@     0x7f9a06abf830  (unknown)
@           0x40b349  _start
Aborted (core dumped)

Theoretically I can train with the batch_size=40*8=320. (Please let me know if I am right here)

So, how can I fully utilized the GPUs to accelerate my model training? Thanks in advance!

1

There are 1 answers

8
shubhamgoel27 On

When using multiple GPUs, you don't need to increase the batch size in your prototxt. If your batch size was 40, Caffe will use that size for each GPU individually, thus effectively giving you a batch size of 40*8 (without you having to change anything).