I learned that we can use the function layer.get_weights()
to get the weight and bias of a layer. This will return a list of length 2. Weights of the layer are stored at layer.get_weights()[0]
and the bias is stored at layer.get_weights()[1]
(If the bias is not disabled during the definition of the layer). This is true for the normal convolutional layer.
I was recently using the Separable convolution layer as one of my layers in the EfficientDet model
.
layers.SeparableConv2D(num_channels, kernel_size=kernel_size, strides=strides, padding='same',
use_bias=True, name=str(name)+"/conv")
When I try to use the same layer.get_weights()
function it returned me a list of length 3
where I was expecting it to be 2
that is the same as the above.
At this, I am a little confused about what are the three values in the list.
Any help and suggestions will be appreciated.
SeparableConv2D
layer is computing the depthwise separable convolution which, unlike normal convolution, requires 2 kernels (2 weight tensors). Without going too much into the detail, it uses the first kernel to compute depthwise convolution, after this operation is applied, it uses the second kernel to compute the pointwise convolution. The main idea behind this is to reduce the number of parameters and therefore the number of computations.Here is a simple example. Assume that we have input image 28x28x3 (width, height, #channels) and we apply the normal 2D convolution (let's say 16 filters and 5x5 kernel, no stride/padding).
If we do the calculation then we end up with 5x5x3x16 (5x5 filter size, 3 input channels and 16 filers) = 1200 kernel parameters + 16 bias parameters (one per each filter) = 1216. We can verify this
gives us
and if we extract the kernel parameters.
This gives us
Now, let's consider separable 2D convolution which has 2 kernels, the depthwise kernel which consists of separate 5x5x1 weight matrices for each input channel, in our case - 5x5x3 (5x5x3x1 - to be consistent with 4D keras tensors). This gives us 75 parameters.
The pointwise kernel is a simple 1x1 convolution (it operates on each input point) and it is used to increase the depth of the result to the number of specified filters. In our case - 1x1x3x16, which gives us 48 parameters.
In total, we have 75 parameters for the first kernel and 48 parameters for the second kernel which gives us 123 parameters plus, again, 16 bias parameters. That is 139 parameters.
In keras,
gives us
As we can see, the output shape of this layers is exactly the same as of the normal convolutional layer but now we have 2 kernels with much less parameters. And again, we can extract parameters for these 2 kernels,
which gives us
You can read this article if you want more detailed information about how separable convolution works.