Pytorch 1.13 dataloader is significantly faster than Pytorch 2.0.1

129 views Asked by At

I've noticed that PyTorch 2.0.1 DataLoader is significantly slower than PyTorch 1.13 DataLoader, especially when the number of workers is set to something other than 0. I've done some research and found that this is due to a change in the way that PyTorch handles multiprocessing in version 2.0.1. In PyTorch 1.13, the DataLoader uses a separate process for each worker. In PyTorch 2.0.1, the DataLoader uses a thread pool to manage the workers.

I'm using a simple DataLoader, but I need to stick to PyTorch 2.0.1 for other reasons. I'm looking for a workaround to speed up my DataLoader.

Steps to reproduce:

Load a dataset using PyTorch 1.13 DataLoader with the following settings: num_workers: 32 pin_memory: True Time the data loading process. Expected behavior:

The data loading process should be faster with PyTorch 2.0.1 DataLoader.

Actual behavior:

The data loading process is significantly slower with PyTorch 2.0.1 DataLoader.

Environment:

PyTorch version: 1.13, 2.0.1 Python version: 3.9 Operating system: Ubuntu 20.04 Question:

Is there a workaround to speed up the PyTorch 2.0.1 DataLoader?

Additional notes:

I've tried reducing the number of workers, but this doesn't significantly improve the performance. I've also tried using a smaller batch size, but this also doesn't significantly improve the performance. I appreciate any help you can provide.

0

There are 0 answers