I am trying to create a PyTorch dataloader for my dataset. Each image has a certain number of cars and a bounding box for each of them, not all images have the same amount of bounding boxes.
You probably wont be able to run it, but here is some info. This is my data loader
class AGR_Dataset(Dataset):
def __init__(self, annotations_root, img_root, transform=None):
"""
Arguments:
annotations_root (string): Path to the csv file with annotations.
img_root (string): Directory with all the images.
transform (callable, optional): Optional transform to be applied
on a sample.
"""
self.annotations_root = annotations_root
self.img_root = img_root
self.transform = transform
def __len__(self):
return len(self.annotations_root)
def __getitem__(self, idx):
# idx may be the index or image name, I think image naem
if torch.is_tensor(idx):
idx = idx.tolist()
idx_name = os.listdir(self.img_root)[idx]
# print(idx_name)
img_name = os.path.join(self.img_root, idx_name)
annotation_data = os.path.join(self.annotations_root, f"{idx_name.removesuffix('.jpg')}.txt")
# print(img_name, annotation_data)
image = io.imread(img_name)
with open(annotation_data, 'r') as file:
lines = file.readlines()
img_data = []
img_labels = []
for line in lines:
line = line.split(',')
line = [i.strip() for i in line]
line = [float(num) for num in line[0].split()]
img_labels.append(int(line[0]))
img_data.append(line[1:])
boxes = tv_tensors.BoundingBoxes(img_data, format='CXCYWH', canvas_size=(image.shape[0], image.shape[1]))
# sample = {'image': image, 'bbox': boxes, 'labels': img_labels}
sample = {'image': image, 'bbox': boxes}
if self.transform:
sample = self.transform(sample)
print(sample['image'].shape)
print(sample['bbox'].shape)
# print(sample['labels'].shape)
return sample
I run my transforms and create the dataloader
data_transform = v2.Compose([
v2.ToImage(),
# v2.Resize(680),
v2.RandomResizedCrop(size=(680, 680), antialias=True),
# v2.ToDtype(torch.float32, scale=True),
v2.ToTensor()
])
transformed_dataset = AGR_Dataset(f'{annotations_path}/test/',
f'{img_path}/test/',
transform=data_transform)
dataloader = DataLoader(transformed_dataset, batch_size=2,
shuffle=False, num_workers=0)
Then I am trying to iterate through it with this, and eventually view and image with the bounding boxes.
for i, sample in enumerate(dataloader):
print(i, sample)
print(i, sample['image'].size(), sample['bbox'].size())
if i == 4:
break
With a batch size of 1, it runs properly, with a batch size of 2, I get this error
torch.Size([3, 680, 680])
torch.Size([12, 4])
torch.Size([3, 680, 680])
torch.Size([259, 4])
RuntimeError: stack expects each tensor to be equal size, but got [12, 4] at entry 0 and [259, 4] at entry 1
- I believe it is due to the number of bounding boxes not being equal, but how do I overcome this?
- Do I need the ToTensor in my transforms? I am starting to think I don't as v2 uses ToImage(), and ToTensor is becoming depreciated.
Any other comments or help would be appreciated. I am not sure how to create a working example, I will continue to try.
What I have tried
I have tried not loading the bounding boxes in as tensors, by commenting the tv_tensors.BoundingBoxes line in the dataloader, but then for some reason my resize doesnt work properly.
I just tried splitting bboxes and images like this in the dataloader
sample = image
target = {'bbox': boxes, 'labels': img_labels}
No luck with that
I have found an answer to the problem.
In the dataloader, the
collate_fnneeds to be set to the collate_fn in the utils package that torchvision has!