How can MobileNetV2 have the same number of parameters for different custom input shapes?

3.8k views Asked by At

I'm following the tensorflow2 tutorial on fine-tunning and transfer learning using a MobileNetV2 as base architecture.

The first thing I noticed is that the biggest input shape available for pre-trained 'imagenet' weights is (224, 224, 3). I tried to use a custom shape (640, 640, 3) and as per the documentation, it gives a warning saying that the weights for the (224, 224, 3) shape were loaded.

So if I load a network like this:

import tensorflow as tf

tf.keras.backend.clear_session()
def create_model():
  base_model = tf.keras.applications.MobileNetV2(input_shape=(640,640,3),
                                include_top=False)
  x = base_model.output
  x = tf.keras.layers.GlobalAveragePooling2D()(x)
  x = tf.keras.layers.Dense((1), activation='sigmoid')(x)
  x = tf.keras.Model(inputs=base_model.inputs, outputs=x)
  x.compile(optimizer=tf.keras.optimizers.RMSprop(lr=0.0001),
                         loss='binary_crossentropy',
                         metrics=[tf.keras.metrics.BinaryAccuracy()])
  return x

tf_model = create_model()

It gives the warning:

WARNING:tensorflow:`input_shape` is undefined or non-square, or `rows` is not in [96, 128, 160, 192, 224]. Weights for input shape (224, 224) will be loaded as the default.

If I try to use an input shape like (224, 224, 3) then the warning vanishes, nevertheless, I tried to check the number of trainable parameters in both cases using

tf_model.summary()

and found out that the number of trainable parameters is the same

Total params: 2,259,265
Trainable params: 2,225,153
Non-trainable params: 34,112

even though the number size of the Convolutional filters changes accordingly to the custom input shape. So how can the number of parameters remain the same even when the Convolutional filters have bigger (spatial) sizes?

2

There are 2 answers

0
NoRest NR On BEST ANSWER

You're rights. The number of conv parameters only depends of the size of the kernel, the number of channels for a particular layer and the total number of layers.

However, the problem when you change the input resolution (here 640x480x3) is that the final layer right before the fc layer won't have the same dimension than the network with 224x224x3. Thus, it's not compatible as is.

Why?

example with Input resolution 224x224x3 :

  1. 1st layer stride = 2 thus output of layer 1 is 112x112x32
  2. 2nd layer stride = 2 thus output of layer 2 is 56x56x16
  3. 3rd layer stride = 1 thus output of layer 3 is 56x56x32
  4. and so on...

The stride affects the resolution of intermediate feature maps. Last layer would be bigger if you use a 640x480x3 input resolution so the FC layer is not compatible. You should transfer the convolutional weights learned from the vanilla model (with 224x224 resolution) to the new convnet compatible with 640x480x3 input data.

0
Atresmo On

After checking in more detail it seems that the number of parameters depends on the kernel sizes and the number of filters of each convolutional layer, as well as the number of neurons on the final fully connected layer and some due to Batch Normalization layers in between.

Since none of these aspects depend on the size of the input images, that is, the spatial resolution may change in the output of each Convolution layer, but the size of the convolutional kernel will still be the same (e.g. 3x3x3), consequently, the number of parameters will also be fixed.

The number of parameters of this kind of network (i.e. Convolutional Neural Networks) is independent of the spatial size of the input. Nevertheless, the number of channels (e.g. 3 in an RGB colored image) must be exactly 3.