I have a pretrained model that I have used from this repository (https://github.com/ViTAE-Transformer/RSP). The model is ResNet50 with a final FC layer that outputs 51 classes. Since I want to solve my problem which has only 43 classes, I have added another FC layer with an output of 43. I have also frozen the weights of all layers except the last 2 FC layers (The one I added and the one that came with model) to fine tune those 2 layers only. Only issue is when I make the forward pass I end up getting 51 classes which is what the original model predicts instead of 43 (almost as if the last layer was not registed by the model). Any idea what I am doing wrong ?
# Load path to RSP Scene Recognition repository in order to load pretrained model
sys.path.append("/home/imantha/workspace/RemSens_SSL/RSP/Scene Recognition")
from models.resnet import resnet50
# Load Model and Pretrained weights
path_to_weights = "pretrain_weights/rsp-aid-resnet-50-e300-ckpt.pth"
res50 = resnet50(num_classes = 51)
res50_state = torch.load(path_to_weights)
res50.load_state_dict(res50_state["model"])
# To Fintune
# Freeze everything !!!
for param in res50.parameters():
param.requires_grad = False
# Unfreeze last layer as we want to finetune it too
res50.fc.weight.requires_grad = True
res50.fc.bias.requires_grad = True
# Add last layer which will also have .requires_grad = True
res50.fc1 = nn.Linear(51, 43)
# some code to get load dataset and data loader
# ...
# Loop through data loader and make forward pass
for X, y in train_loader:
yhat = res50(X)
print(f"yhat.shape : {yhat.shape} , y.shape : {y.shape}")
break
>>> yhat.shape : torch.Size([64, 51]) , y.shape : torch.Size([64, 43])
But if you look at all the layers in the model it looks fine,
print(res50)
>>>
...
(2): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(fc): Linear(in_features=2048, out_features=51, bias=True)
(fc1): Linear(in_features=51, out_features=43, bias=True)
However if I follow this approach it seems to work.
new_model = nn.Sequential(
res50,
nn.Linear(51,43)
)
for X, y in train_loader:
yhat = new_model(X)
print(f"yhat.shape : {yhat.shape} , y.shape : {y.shape}")
break
>>>
yhat.shape : torch.Size([64, 43]) , y.shape : torch.Size([64, 43])
print(new_model)
>>>
...
(2): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(fc): Linear(in_features=2048, out_features=51, bias=True)
)
(1): Linear(in_features=51, out_features=43, bias=True)
)
Any idea why the previous approach didnt work !
You should read the basic pytorch documentation. You have to change the
forward
method of the model to include the use offc1
.When you run
res50.fc1 = nn.Linear(51, 43)
, all you're doing is assigningnn.Linear
to the attributefc1
. You haven't changed the model. How do you expect the model to know whatfc1
is or when it is supposed to be used?Although for your case, what you actually want to do is replace
fc
entirely. Thefc
layer predicts classes for the original task the model was trained on. It makes no sense to feed those values into a new layer predicting new classes. You are asking the model to predict your classes by taking a weighted average of the old classes.You need to replace the
fc
layer entirely and use the model's 2048 size latent representation to predict your output.In this case you can simply assign
res50.fc = nn.Linear(2048, 43)
. This works (as opposed to thefc1
case) because the model'sforward
method already usesfc
.