I am trying to fine tune my yolov7-tiny pytorch model on my custom dataset which contains 40k images on google colab it gives me the error on first epoch
Command
!python train.py --batch 16 --cfg cfg/training/yolov7-tiny.yaml --epochs 30 --data data/custom_data.yaml --weights 'yolov7-tiny.pt' --device 'cpu' --hyp /content/yolov7/data/hyp.scratch.tiny.yaml
Error
/content/yolov7
2023-11-23 09:41:50.546886: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-23 09:41:50.546968: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-23 09:41:50.547001: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-23 09:41:50.556142: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-23 09:41:51.761396: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
YOLOR v0.1-128-ga207844 torch 2.1.0+cu118 CPU
Namespace(weights='yolov7-tiny.pt', cfg='cfg/training/yolov7-tiny.yaml', data='data/custom_data.yaml', hyp='/content/yolov7/data/hyp.scratch.tiny.yaml', epochs=30, batch_size=16, img_size=[640, 640], rect=False, resume=False, nosave=False, notest=False, noautoanchor=False, evolve=False, bucket='', cache_images=False, image_weights=False, device='cpu', multi_scale=False, single_cls=False, adam=False, sync_bn=False, local_rank=-1, workers=8, project='runs/train', entity=None, name='exp', exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, upload_dataset=False, bbox_interval=-1, save_period=-1, artifact_alias='latest', freeze=[0], v5_metric=False, world_size=1, global_rank=-1, save_dir='runs/train/exp2', total_batch_size=16)
tensorboard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.05, copy_paste=0.0, paste_in=0.05, loss_ota=1
wandb: Install Weights & Biases for YOLOR logging with 'pip install wandb' (recommended)
from n params module arguments
0 -1 1 928 models.common.Conv [3, 32, 3, 2, None, 1, LeakyReLU(negative_slope=0.1)]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2, None, 1, LeakyReLU(negative_slope=0.1)]
2 -1 1 2112 models.common.Conv [64, 32, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
3 -2 1 2112 models.common.Conv [64, 32, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
4 -1 1 9280 models.common.Conv [32, 32, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
5 -1 1 9280 models.common.Conv [32, 32, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
6 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
7 -1 1 8320 models.common.Conv [128, 64, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
8 -1 1 0 models.common.MP []
9 -1 1 4224 models.common.Conv [64, 64, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
10 -2 1 4224 models.common.Conv [64, 64, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
11 -1 1 36992 models.common.Conv [64, 64, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
12 -1 1 36992 models.common.Conv [64, 64, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
13 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
14 -1 1 33024 models.common.Conv [256, 128, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
15 -1 1 0 models.common.MP []
16 -1 1 16640 models.common.Conv [128, 128, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
17 -2 1 16640 models.common.Conv [128, 128, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
18 -1 1 147712 models.common.Conv [128, 128, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
19 -1 1 147712 models.common.Conv [128, 128, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
20 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
21 -1 1 131584 models.common.Conv [512, 256, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
22 -1 1 0 models.common.MP []
23 -1 1 66048 models.common.Conv [256, 256, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
24 -2 1 66048 models.common.Conv [256, 256, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
25 -1 1 590336 models.common.Conv [256, 256, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
26 -1 1 590336 models.common.Conv [256, 256, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
27 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
28 -1 1 525312 models.common.Conv [1024, 512, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
29 -1 1 131584 models.common.Conv [512, 256, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
30 -2 1 131584 models.common.Conv [512, 256, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
31 -1 1 0 models.common.SP [5]
32 -2 1 0 models.common.SP [9]
33 -3 1 0 models.common.SP [13]
34 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
35 -1 1 262656 models.common.Conv [1024, 256, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
36 [-1, -7] 1 0 models.common.Concat [1]
37 -1 1 131584 models.common.Conv [512, 256, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
38 -1 1 33024 models.common.Conv [256, 128, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
39 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
40 21 1 33024 models.common.Conv [256, 128, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
41 [-1, -2] 1 0 models.common.Concat [1]
42 -1 1 16512 models.common.Conv [256, 64, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
43 -2 1 16512 models.common.Conv [256, 64, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
44 -1 1 36992 models.common.Conv [64, 64, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
45 -1 1 36992 models.common.Conv [64, 64, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
46 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
47 -1 1 33024 models.common.Conv [256, 128, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
48 -1 1 8320 models.common.Conv [128, 64, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
49 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
50 14 1 8320 models.common.Conv [128, 64, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
51 [-1, -2] 1 0 models.common.Concat [1]
52 -1 1 4160 models.common.Conv [128, 32, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
53 -2 1 4160 models.common.Conv [128, 32, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
54 -1 1 9280 models.common.Conv [32, 32, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
55 -1 1 9280 models.common.Conv [32, 32, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
56 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
57 -1 1 8320 models.common.Conv [128, 64, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
58 -1 1 73984 models.common.Conv [64, 128, 3, 2, None, 1, LeakyReLU(negative_slope=0.1)]
59 [-1, 47] 1 0 models.common.Concat [1]
60 -1 1 16512 models.common.Conv [256, 64, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
61 -2 1 16512 models.common.Conv [256, 64, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
62 -1 1 36992 models.common.Conv [64, 64, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
63 -1 1 36992 models.common.Conv [64, 64, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
64 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
65 -1 1 33024 models.common.Conv [256, 128, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
66 -1 1 295424 models.common.Conv [128, 256, 3, 2, None, 1, LeakyReLU(negative_slope=0.1)]
67 [-1, 37] 1 0 models.common.Concat [1]
68 -1 1 65792 models.common.Conv [512, 128, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
69 -2 1 65792 models.common.Conv [512, 128, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
70 -1 1 147712 models.common.Conv [128, 128, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
71 -1 1 147712 models.common.Conv [128, 128, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
72 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
73 -1 1 131584 models.common.Conv [512, 256, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
74 57 1 73984 models.common.Conv [64, 128, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
75 65 1 295424 models.common.Conv [128, 256, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
76 73 1 1180672 models.common.Conv [256, 512, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
77 [74, 75, 76] 1 82076 models.yolo.IDetect [25, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
/usr/local/lib/python3.10/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Model Summary: 263 layers, 6079932 parameters, 6079932 gradients, 13.4 GFLOPS
Transferred 330/344 items from yolov7-tiny.pt
Scaled weight_decay = 0.0005
Optimizer groups: 58 .bias, 58 conv.weight, 61 other
train: Scanning '/content/drive/MyDrive/combined_data_14sep/combined_data_14sep/train/labels.cache' images and labels... 28588 found, 1 missing, 6 empty, 2 corrupted: 100% 28590/28590 [00:00<?, ?it/s]
val: Scanning '/content/drive/MyDrive/combined_data_14sep/combined_data_14sep/val/labels.cache' images and labels... 6163 found, 2 missing, 0 empty, 1 corrupted: 100% 6165/6165 [00:00<?, ?it/s]
autoanchor: Analyzing anchors... anchors/target = 4.01, Best Possible Recall (BPR) = 0.9997
Image sizes 640 train, 640 test
Using 8 dataloader workers
Logging results to runs/train/exp2
Starting training for 30 epochs...
Epoch gpu_mem box obj cls total labels img_size
0/29 0G 0.07406 0.02084 0.09498 0.1899 44 640: 2% 42/1787 [14:15<3:01:29, 6.24s/it]libpng warning: iCCP: known incorrect sRGB profile
0/29 0G 0.06949 0.01645 0.08452 0.1705 35 640: 8% 150/1787 [25:10<2:49:18, 6.21s/it]libpng warning: iCCP: known incorrect sRGB profile
0/29 0G 0.06428 0.01585 0.07459 0.1547 34 640: 17% 295/1787 [39:58<2:37:10, 6.32s/it]libpng warning: iCCP: known incorrect sRGB profile
0/29 0G 0.0612 0.01587 0.06808 0.1452 53 640: 24% 425/1787 [53:20<2:15:37, 5.97s/it]libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: cHRM chunk does not match sRGB
0/29 0G 0.06106 0.0159 0.06779 0.1447 55 640: 24% 433/1787 [54:10<2:20:43, 6.24s/it]libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: cHRM chunk does not match sRGB
0/29 0G 0.0596 0.01599 0.06443 0.14 66 640: 28% 500/1787 [1:00:59<2:16:03, 6.34s/it]libpng warning: iCCP: known incorrect sRGB profile
0/29 0G 0.05856 0.01605 0.06269 0.1373 65 640: 31% 555/1787 [1:06:34<2:12:26, 6.45s/it]libpng warning: iCCP: profile 'ICC Profile': 0h: PCS illuminant is not D50
0/29 0G 0.0559 0.01591 0.05811 0.1299 57 640: 41% 735/1787 [1:24:59<1:46:40, 6.08s/it]libpng warning: iCCP: profile 'ICC Profile': 0h: PCS illuminant is not D50
0/29 0G 0.05285 0.01518 0.04852 0.1166 62 640: 66% 1184/1787 [2:12:21<1:06:22, 6.60s/it]libpng warning: iCCP: profile 'ICC Profile': 0h: PCS illuminant is not D50
0/29 0G 0.05234 0.01525 0.04795 0.1155 44 640: 71% 1268/1787 [2:22:00<1:00:09, 6.95s/it]libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: cHRM chunk does not match sRGB
0/29 0G 0.0511 0.01518 0.045 0.1113 78 640: 88% 1574/1787 [2:55:58<24:06, 6.79s/it]libpng warning: iCCP: known incorrect sRGB profile
0/29 0G 0.05095 0.01517 0.04469 0.1108 52 640: 90% 1611/1787 [3:00:03<21:35, 7.36s/it]libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: cHRM chunk does not match sRGB
0/29 0G 0.05036 0.01521 0.04345 0.109 41 640: 100% 1787/1787 [3:19:39<00:00, 6.70s/it]
Class Images Labels P R [email protected] [email protected]:.95: 73% 141/193 [11:14<04:08, 4.78s/it]
Traceback (most recent call last):
File "/content/yolov7/train.py", line 616, in <module>
train(hyp, opt, device, tb_writer)
File "/content/yolov7/train.py", line 415, in train
results, maps, times = test.test(data_dict,
File "/content/yolov7/test.py", line 119, in test
loss += compute_loss([x.float() for x in train_out], targets)[1][:3] # box, obj, cls
File "/content/yolov7/utils/loss.py", line 477, in __call__
t[range(n), tcls[i]] = self.cp
IndexError: index 5283828224 is out of bounds for dimension 1 with size 25
I'm seeking assistance in identifying and resolving this issue. I'm not entirely certain where the root cause lies, but I'm open to suggestions and guidance from experienced individuals.