Low classification accuacy - faster-rcnn model with Detectron2

92 views Asked by At

I am currently working on a research project in university about object detection and synthetic dataset generation. As part of that project, i am trying to train a faster-rcnn model on the GTSDB dataset using the Detectron2 framework, but i am struggling to get good results.

This is my first time working with anything related to AI, so i apologize for my general lack of understanding on the topic and/or lack of clarity in my question.

I have followed the official Detectron2 Colab Notebook tutorial for training on a custom dataset (adjusted for object detection with faster-rcnn architecture), as well as some other tutorials online which did not deviate much from the Colab Notebook, and ended with the following code for training:

# load_dataset function implementation omitted 
DatasetCatalog.register("GTSDB_train", lambda: helpers.load_dataset(helpers.GTSDB_TRAIN_PATH))

model = "COCO-Detection/faster_rcnn_X_101_32x8d_FPN_3x.yaml"

# CONFIG SETUP
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file(model))
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(model)

cfg.DATALOADER.NUM_WORKERS = 4
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 512

cfg.MODEL.ROI_HEADS.NUM_CLASSES = 43 # dataset does contain 43 classes

# warmup for the first 200 iterations
cfg.SOLVER.WARMUP_ITERS = 200
cfg.SOLVER.WARMUP_FACTOR = 1.0/100 
cfg.SOLVER.WARMUP_METHOR = "linear"

cfg.SOLVER.BASE_LR = 0.001

# divide LR by 10 every 400 steps, train for 2000
cfg.SOLVER.MAX_ITER = 2000
cfg.SOLVER.GAMMA = 0.1
cfg.SOLVER.STEPS = [400, 800, 1200, 1600]

cfg.DATASETS.TRAIN = ("GTSDB_train",)
cfg.DATASETS.TEST = ()

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)

trainer.train()

I used the following code to check if the dataset was being registered correctly:

#imports

dataset = helpers.GTSDB_train_dataset()
metadata = helpers.get_GTSDB_metadata() # metadata only contains class names

for d in random.sample(dataset, 5):
    img = cv2.imread(d['file_name'])
    visualizer = Visualizer(img[:,:,::-1], metadata=metadata, scale=1.2)
    out = visualizer.draw_dataset_dict(d)
    cv2.imshow("img", out.get_image()[:,:,::-1])
    cv2.waitKey(0)

As seen in this example, the dataset seems to be correctly registered, with the proper bouding boxes and class names

Some tensorboard results after training:

rcnn tensorboard results after training

total loss over time

As shown in the first image above, cls_accuracy seems to have stabilized at a good value (>90%), but fg_cls_accuracy keeps pretty low, oscillating around 20%. During inferencing, this is reflected with wrong and/or low-confidence classification on the proposed bounding boxes.

Example 1 || Example 2 (examples showing only proposals with >15% confidence score)

As seen in the examples, bounding box proposal seems to be pretty accurate, but classification on these boxes is bad.

I have tried training for a longer period and lowering the LR more after a couple more hundred iterations, but the model seems to have stabilized at these results.

My questions are:

  1. Why is cls_accuracy high but fg_cls_accuracy low? What does this tell me about the models behavior and is it normal?
  2. How can i try to improve classification accuracy? What hyperparameters should i tinker with to hopefully get better results?
  3. Should i try to tinker with the model architecture? As shown in code, i am using a pre-built faster rcnn implementation from Detectron2's model zoo and only changing some configs without messing with the architecture itself. Could i have problems because of that? e.g. becouse of image resolution and/or relative object size?

There are not many tutorials on the Detectron2 framework, and most of them only explain how to use the framework on a surface level, but all of them seem to "just work" without much adjustment, so i am really at a loss as to what i could be doing wrong.

As i said, i am a beginner in AI/ML, so any advice would be greatly appreciated, either for using the Detectron2 framework or for working with neural networks in general.

0

There are 0 answers