I'm using detectron2 for solving a segmentation task, I'm trying to classify an object into 4 classes, so I have used COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml. I have applied 4 kind of augmentation transforms and after training I get about 0.1 total loss.
But for some reason the accuracy of the bbox is not great on some images on the test set, the bbox is drawn either larger or smaller or doesn't cover the whole object.
Moreover sometimes the predictor draws few bboxes, it assumes there are few different objects although there is only a single object.
Are there any suggestions how to improve it's accuracy?
Are there any good practice approaches how to resolve this issue?
Any suggestion or reference material will be helpful.
 
                        
I would suggest the following:
The biggest step towards improvement will be achieved by means of (2).
Once you have a decent baseline, you could also experiment with augmentations.