I am new to deep learning, I'm using the Tensorflow 2 object detection API to fine-tune a Faster R-CNN model (pretrained on COCO 2017 dataset) on the Mapillary dataset to detect road signs. The dataset contains 36589 images in the training set (with 180287 bounding boxes) and 5320 images in the validation set (with 26101 bounding boxes).
But I'm experiencing a weird behaviour: the training and validation losses seem to converge (I'm also trying with different learning rates) but the mAP remains 0.
By using the following configuration:
# Faster R-CNN with Resnet-50 (v1)
# Trained on COCO, initialized from Imagenet classification checkpoint
# This config is TPU compatible.
model {
faster_rcnn {
num_classes: 314
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 1024
max_dimension: 1024
pad_to_max_dimension: true
}
}
feature_extractor {
type: 'faster_rcnn_resnet101_keras'
batch_norm_trainable: false #fine-tuning: false | from-scratch: true
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 16
width_stride: 16
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0 #no regularization?
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.7
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: false
dropout_keep_probability: 1.0
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0 #no regularization?
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
share_box_across_classes: false #it should'nt be needed in this case since a box has only one class
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 300
}
score_converter: SOFTMAX #applies SOFTMAX on input detection scores
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
use_static_shapes: true
use_matmul_crop_and_resize: true
clip_anchors_to_image: true
use_static_balanced_label_sampler: true
use_matmul_gather_in_matcher: true
}
}
train_config: {
batch_size: 1
sync_replicas: true
startup_delay_steps: 0
replicas_to_aggregate: 8
num_steps: 10000
optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: .0000004
total_steps: 10000
warmup_learning_rate: .000000133
warmup_steps: 2000
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
add_regularization_loss: true
fine_tune_checkpoint_version: V2
fine_tune_checkpoint: "pretrained-model/faster_rcnn_resnet101_v1_1024x1024_coco17_tpu-8/checkpoint/ckpt-0"
fine_tune_checkpoint_type: "detection"
# data_augmentation_options {
# random_horizontal_flip {
# }
# }
# data_augmentation_options {
# random_adjust_hue {
# }
# }
# data_augmentation_options {
# random_adjust_contrast {
# }
# }
# data_augmentation_options {
# random_adjust_saturation {
# }
# }
# data_augmentation_options {
# random_square_crop_by_scale {
# scale_min: 0.6
# scale_max: 1.3
# }
# }
#merge_multiple_label_boxes: false
max_number_of_boxes: 206388 #this is the total number considering the entire dataset
unpad_groundtruth_tensors: false
use_bfloat16: false # works only on TPUs
}
train_input_reader: {
label_map_path: "data/training/label_map.pbtxt"
tf_record_input_reader {
input_path: "data/training/train.tfrecord"
}
}
eval_config: {
min_score_threshold: 0.5
batch_size: 1
#num_examples: 100
num_visualizations: 10
metrics_set: "pascal_voc_detection_metrics"
use_moving_averages: false
eval_interval_secs: 30
#max_evals: 10
include_metrics_per_category: true
}
eval_input_reader: {
label_map_path: "data/validation/label_map.pbtxt"
shuffle: false
num_epochs: 1
tf_record_input_reader {
input_path: "data/validation/train.tfrecord"
}
}
I get the following result:
but all the mAP are 0 (total and per category).
Whereas by changing the learning rate base to .00000004 and the warmup learning rate to .0000000133 the batch_size to 8 and the num_steps to 50000 I get the following result (it is still training):
The performance by category are all 0 but the total mAP seems to begin improving a bit, but is still very low:
I suppose that this behaviour could be caused by some wrong hyperparameters. Should I change something in the config file? Also what do you think would be the best config between the two?
Note: before using the entire dataset I trained the model on a very small dataset (with an higher learning rate) and by using the same dataset to evaluate the model it performed pretty well (but probably it was overfitted). Switching on the entire dataset leads to this strange behaviour.