YOLOv8 multiGPU usage doesn't increase the training speed

80 views Asked by At

I'm asking for help. I am repurposing an older PC with Intel Pentium G4400 as CPU, 16GB of RAM and 7 GTX1080Ti. I want to train YOLOv8 model on it. I know that there are some limitations given the CPU and RAM, but it might not be that bad. But I've ran to some issues. I've installed the CUDA, Ultralytics and it's working if I wanna train with one GPU on it. But when I try to train with more GPUs the results are not as expected. I would've thought that the training time for one epoch would be smaller, but it is consistent as with the one GPU. I'm changing the batch size given the number of GPUs in use. It is just slow and inefficient. There are some issues like this:

[W socket.cpp:697] [c10d] The client socket has failed to connect to ["name_of_pc"]: 54920 (system error: 10049 - The requested address is not valid in its context.).

I've googled the error but found no answer. I don't if the firewall is to blame here though. This is the code I'm using:

import os
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'

import torch
from ultralytics import YOLO

if __name__ == '__main__':
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
        cuda_devices = list(range(torch.cuda.device_count()))
        print("Using GPUs:", ", ".join([torch.cuda.get_device_name(i) for i in cuda_devices]))
       
    else:
        print("Using CPU")
       # num_workers = 0

    # Load the YOLOv8 model
    model = YOLO('yolov8n.pt')  # Automatic device selection

    # Train the model with specified and adjusted parameters
    results = model.train(
        data=r"D:\AI_Training\Datasets\siemens_numbers_mk3\data.yaml",
        imgsz=1280,  # Image size
        epochs=100,  # Number of epochs
        batch=64,  # Batch size, adjust based on your GPU memory capacity
        name='yolov8_b16_e100_sz1280_dtmk3',  # Custom run name
        workers=2,  # Adjusted based on GPU availability        
        amp=True,  # Automatic Mixed Precision        
        device=[0,1,2,3]  
        
    )

And I change the number of devices based on the try. Someone has any ideas how to make it work please? Thanks!

I've made changes to the train batches and number of GPUs and all sorts of things, but none worked.

0

There are 0 answers