Is there a way to monitor how the shapes are changing inside VGG19 pretrained model?

54 views Asked by At

So the thing is I am trying to do something with VGG19 pretrained model and initial 8 convolution layers are frozen. My image size is 3 x 400 x 400 and I do not want to resize as it may affect the performance. I am getting error again and again that the matrices can not be multiplied, so is there a way to pass 3 x 400 x 400 image through vgg19?

weights = VGG19_Weights.DEFAULT
vgg19_layer = vgg19(weights = weights)

for i in range(16):
    for param in vgg19_layer.features[i].parameters():
        param.requires_grad = False

This is how I am using vgg19_layer.

self.vgg19 = vgg19_layer

This is how I am currently sending the input.

x = torch.randn((3, 400, 400))

model(x)

Output

RuntimeError: mat1 and mat2 shapes cannot be multiplied (512x49 and 25088x4096)
2

There are 2 answers

0
ibra ndiaye On

The problem is that the shape of your input tensor is incorrect. The model expects a shape [B, 3, 400, 400] with B being the batch size. If you send a single image at a time to your model, then you need to use a tensor of shape [1, 3, 400, 400] and not [3, 400, 400]. You can achieve this with torch.unsqueeze(). At the output of the network, you can eliminate this dimension with torch.squeeze().

Also note that instead of explicitly setting require_grad to false, you can simply put the model in evaluation mode (model.eval()) and also decorate your inference function with torch.no_grad(). This will immediately deactivate any gradient calculations within the function.

To further clarify, the model expects a tensor of shape [B, 3, 400, 400] where B represents the batch size, 3 represents the RGB color channels, and 400x400 is the image resolution. If you are sending a single image, it needs to be reshaped to [1, 3, 400, 400] for the model to process it correctly. The torch.unsqueeze() function can be used to add an additional dimension required for batch processing. Similarly, torch.squeeze() can be used to remove this extra dimension after the image has been processed.

Finally, when performing inference, it's not necessary to compute gradients. You can stop PyTorch from calculating gradients by using torch.no_grad(). This can save memory and speed up your code. Also, remember to set your model to evaluation mode using model.eval() before running inference. This will set all the layers in your model to evaluation mode, which is essential for some types of layers like dropout and batch normalization which behave differently during training and evaluation.

0
Anna Andreeva Rogotulka On

In order to get summary about layers of your model use torchsummary, an example below

from torchsummary import summary
summary(CNNNetwork().cuda(), (1, 64, 16))



----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [-1, 16, 64, 16]             416
       BatchNorm2d-2           [-1, 16, 64, 16]              32
              ReLU-3           [-1, 16, 64, 16]               0
         MaxPool2d-4            [-1, 16, 32, 8]               0
            Conv2d-5            [-1, 32, 32, 8]          12,832
              ReLU-6            [-1, 32, 32, 8]               0
         MaxPool2d-7            [-1, 32, 16, 4]               0
       BatchNorm2d-8            [-1, 32, 16, 4]              64
            Conv2d-9            [-1, 64, 18, 6]          18,496
             ReLU-10            [-1, 64, 18, 6]               0
        MaxPool2d-11             [-1, 64, 9, 3]               0
      BatchNorm2d-12             [-1, 64, 9, 3]             128
           Conv2d-13           [-1, 128, 11, 5]          73,856
             ReLU-14           [-1, 128, 11, 5]               0
        MaxPool2d-15            [-1, 128, 5, 2]               0
      BatchNorm2d-16            [-1, 128, 5, 2]             256
          Flatten-17                 [-1, 1280]               0
           Linear-18                   [-1, 35]          44,835
================================================================
Total params: 150,915
Trainable params: 150,915
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.83
Params size (MB): 0.58
Estimated Total Size (MB): 1.41
----------------------------------------------------------------