I have the following setup:
- Google Compute Engine VM with Nvidia Tesla T4 , 4CPU 16GB RAM
- Conda environment where I am running the
inference_realesrgan.pycloned from https://github.com/xinntao/Real-ESRGAN - Everything works as expected - the upscaling process is fine, but slow, the bottleneck is probably not the NVidia GPU, but the python process which uses only a single thread - probably for reading/converting/handling images by opencv? (see the attached image)
- My goal: utilize all 4 CPUs during the upscale process
- my initial command:
conda activate base && python ~/Real-ESRGAN/inference_realesrgan.py -i ~/in -o ~/out -dn 1 -t 660 -g 0 --model_path ~/my_model.pth
The screenshot from Google Cloud VM - htop:

Any hint/help appreciated!
Python's Global Interpreter Lock (GIL) runs CPU tasks in a single thread. But the multiprocessing module might help, if there are enough images to warrant parallelizing.