I have been trying to train an object detection model using the tensorflow object detection API.
The network trains well when batch_size
is 1. However, increasing the batch_size
leads to the following error after some steps.
Network : Faster RCNN
train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0002
schedule {
step: 25000
learning_rate: .00002
}
schedule {
step: 50000
learning_rate: .000002
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
Error:
INFO:tensorflow:Error reported to Coordinator: , ConcatOp : Dimensions of inputs should match: shape[0] = [1,841,600,3] vs. shape[3] = [1,776,600,3]
[[node concat (defined at /home/<>/.virtualenvs/dl4cv/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/legacy/trainer.py:190) ]]
Errors may have originated from an input operation.
Input Source operations connected to node concat:
Preprocessor_3/sub (defined at /home/<>/.virtualenvs/dl4cv/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/models/faster_rcnn_inception_v2_feature_extractor.py:100)
The training with increased batch_size
works on SSD mobilenet however.
While, I have solved the issue for my use-case at the moment, posting this question in SO to understand the reason for this behavior.
When increasing the batch size, the images loaded in the Tensors should all be of the same size.
This is how you may get the images to be all of the same size:
Padding the images to the maximum dimension, making that "true", that will cause the images to be all of the same size. This enables you to have a batch size larger than one.