Low training and validation loss but also low mAP on Faster R-CNN on Mapillary dataset

Question

Low training and validation loss but also low mAP on Faster R-CNN on Mapillary dataset

1.3k views Asked by AudioBubble At 07 October 2020 at 06:51

I am new to deep learning, I'm using the Tensorflow 2 object detection API to fine-tune a Faster R-CNN model (pretrained on COCO 2017 dataset) on the Mapillary dataset to detect road signs. The dataset contains 36589 images in the training set (with 180287 bounding boxes) and 5320 images in the validation set (with 26101 bounding boxes).

But I'm experiencing a weird behaviour: the training and validation losses seem to converge (I'm also trying with different learning rates) but the mAP remains 0.

By using the following configuration:

# Faster R-CNN with Resnet-50 (v1)
# Trained on COCO, initialized from Imagenet classification checkpoint

# This config is TPU compatible.

model {
  faster_rcnn {
    num_classes: 314
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 1024
        max_dimension: 1024
        pad_to_max_dimension: true
      }
    }
    feature_extractor {
      type: 'faster_rcnn_resnet101_keras'
      batch_norm_trainable: false #fine-tuning: false | from-scratch: true
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 16
        width_stride: 16
      }
    }
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0 #no regularization?
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.7
    first_stage_max_proposals: 300
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 14
    maxpool_kernel_size: 2
    maxpool_stride: 2
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        use_dropout: false
        dropout_keep_probability: 1.0
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0 #no regularization?
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
        share_box_across_classes: false #it should'nt be needed in this case since a box has only one class
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.0
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 300
      }
      score_converter: SOFTMAX #applies SOFTMAX on input detection scores
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
    use_static_shapes: true
    use_matmul_crop_and_resize: true
    clip_anchors_to_image: true
    use_static_balanced_label_sampler: true
    use_matmul_gather_in_matcher: true
  }
}

train_config: {
  batch_size: 1
  sync_replicas: true
  startup_delay_steps: 0
  replicas_to_aggregate: 8
  num_steps: 10000
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: .0000004
          total_steps: 10000
          warmup_learning_rate: .000000133
          warmup_steps: 2000
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }

  add_regularization_loss: true
  fine_tune_checkpoint_version: V2
  fine_tune_checkpoint: "pretrained-model/faster_rcnn_resnet101_v1_1024x1024_coco17_tpu-8/checkpoint/ckpt-0"
  fine_tune_checkpoint_type: "detection"

#  data_augmentation_options {
#    random_horizontal_flip {
#    }
#  }
#  data_augmentation_options {
#    random_adjust_hue {
#    }
#  }
#  data_augmentation_options {
#    random_adjust_contrast {
#    }
#  }
#  data_augmentation_options {
#    random_adjust_saturation {
#    }
#  }
#  data_augmentation_options {
#     random_square_crop_by_scale {
#      scale_min: 0.6
#      scale_max: 1.3
#    }
#  }

  #merge_multiple_label_boxes: false
  max_number_of_boxes: 206388 #this is the total number considering the entire dataset
  unpad_groundtruth_tensors: false
  use_bfloat16: false  # works only on TPUs
}

train_input_reader: {
  label_map_path: "data/training/label_map.pbtxt"
  tf_record_input_reader {
    input_path: "data/training/train.tfrecord"
  }
}

eval_config: {
  min_score_threshold: 0.5
  batch_size: 1
  #num_examples: 100
  num_visualizations: 10
  metrics_set: "pascal_voc_detection_metrics"
  use_moving_averages: false
  eval_interval_secs: 30
  #max_evals: 10
  include_metrics_per_category: true
}

eval_input_reader: {
  label_map_path: "data/validation/label_map.pbtxt"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "data/validation/train.tfrecord"
  }
}

I get the following result: tensorboard graphs

but all the mAP are 0 (total and per category).

Whereas by changing the learning rate base to .00000004 and the warmup learning rate to .0000000133 the batch_size to 8 and the num_steps to 50000 I get the following result (it is still training): Tensorboard graphs 2

The performance by category are all 0 but the total mAP seems to begin improving a bit, but is still very low: total pascal mAP

I suppose that this behaviour could be caused by some wrong hyperparameters. Should I change something in the config file? Also what do you think would be the best config between the two?

Note: before using the entire dataset I trained the model on a very small dataset (with an higher learning rate) and by using the same dataset to evaluate the model it performed pretty well (but probably it was overfitted). Switching on the entire dataset leads to this strange behaviour.

Original Q&A

TechQA.

Low training and validation loss but also low mAP on Faster R-CNN on Mapillary dataset

There are 0 answers

Related Questions in PYTHON

Related Questions in TENSORFLOW

Related Questions in DEEP-LEARNING

Related Questions in OBJECT-DETECTION

Related Questions in FASTER-RCNN

Popular Questions

Popular Tags

Trending Questions