Expected to train model on 4 GPUs but 3/4 of the data disappears(?) somewhere
Looked all over other issues but to no avail - so asking my question here.
I am trying to train the model on 4 GPUs using torch.nn.DataParallel.
Batch size is 64, so data on one GPU should have shape of [16, ..., ...].
The strange thing is, the data IS distributed and GPU-Util on nvidia-smi shows that there are calculations being performed on each...
When I try next line for input (before any calculations)
print(src_vid.get_device(), src_vid.shape)
it gives 0 torch.Size([16, 75, 256]) - showing data only on the first GPU.
Same happens for output - it is supposed to be collected on a single GPU (0 by default), but again, it shows the shape of [16,...,...].
Trained almost identical model in the same virtual environment and everything was okay...