I had a pytorch task, which worked with DP:
One same network copied to multiple GPUs sharing the same weights, but each copy receives a different data batch, so it speeds up training by increasing equivalent batch size.
But now I hope to introduce multiple different networks into the training flow: net_A, net_B, net_C, and they are of different architectures and don't share weights.
Is it possible to assign each network to a different node (1 node with 4 GPUs), so that "net_A" can still enjoy the speed up of DP on 4 GPUs of "node_A", and "net_B" occupies "node_B", etc?