Training a retinanet with varying bounding box output sizes

100 views Asked by At

I am looking to train a retinanet_resnet50_fpn_v2 in pytorch, but am facing a problem with varying output sizes.

I am training on the SKU110K dataset, which has images that can have a varying number of bounding boxes. For example, image 1 contains 35 bounding boxes, image 2 contains 79, image 3 contains 132.

When I try to create a Dataset that returns a batch of images with their corresponding bounding boxes, I get

stack expects each tensor to be equal size, but got [74, 4] at entry 0 and [128, 4] at entry 1

I created a collate function to pad the bounding boxes so that they were all the same shape, like so:


def collater(data):
    imgs = [s["img"] for s in data]
    annots = [s["boxes"] for s in data]
    labels = [s["label"] for s in data]

    max_num_annots = max(annot.shape[0] for annot in annots)

    if max_num_annots > 0:
        annot_padded = torch.zeros((len(annots), max_num_annots, 4))

        for idx, annot in enumerate(annots):
            # print(annot.shape)
            if annot.shape[0] > 0:
                annot_padded[idx, : annot.shape[0], :] = torch.from_numpy(annot)
    else:
        annot_padded = torch.zeros((len(annots), 1, 5))

    return {"img": imgs, "boxes": annot_padded, "labels": labels}

I then get AssertionError: All bounding boxes should have positive height and width. Found invalid box [1.25, 1.25, 1.25, 1.25] for target at index 0.

What is the correct way to train this network given that any image can have a varying number of input and outputs?

0

There are 0 answers