I am trying to train a resnet for 32x32 images, and I came upon a tutorial: https://towardsdatascience.com/resnets-for-cifar-10-e63e900524e0, which applies to cifar-10 (32x32 image dataset), but I don't understand what its saying.
This is from the site:
"The rest of the notes from the authors to construct the ResNet are:
Use a stack of 6 layers of 3x3 convolutions. The choice of will determine the size of our ResNet.
The feature map sizes are {32, 16, 8} respectively with 2 convolutions for each feature map size. Also, the number of filters is {16, 32, 64} respectively.
The down sampling of the volumes through the ResNet is achieved increasing the stride to 2, for the first convolution of each layer. Therefore, no pooling operations are used until right before the dense layer.
For the bypass connections, no projections will be used. In the cases where there is a different in the shape of the volume, the input will be simply padded with zeros, so the output size matched the size of the volume before the addition.
This would leave Figure 4 as the representation of our first layer. In this case, our bypass connection is a regular Identity Shortcut because the dimensionality of the volume is constant thorough the layer operations. Since we chose n=1, 2 convolutions are applied within the layer 1.
We still can check from Figure 2 that the output volume of Layer1 is indeed 32x32x16. Let’s go deeper!"
I am confused why the number of channels in the output is 16, when the number of filters was {16, 32, 64}
Note that the number of filters for layer 1 is 16, and for layer 2 is 32 and layer 3 is 64.