Pretrained model return old output even after adding a new layer

32 views Asked by At

I have a pretrained model that I have used from this repository (https://github.com/ViTAE-Transformer/RSP). The model is ResNet50 with a final FC layer that outputs 51 classes. Since I want to solve my problem which has only 43 classes, I have added another FC layer with an output of 43. I have also frozen the weights of all layers except the last 2 FC layers (The one I added and the one that came with model) to fine tune those 2 layers only. Only issue is when I make the forward pass I end up getting 51 classes which is what the original model predicts instead of 43 (almost as if the last layer was not registed by the model). Any idea what I am doing wrong ?

# Load path to RSP Scene Recognition repository in order to load pretrained model
sys.path.append("/home/imantha/workspace/RemSens_SSL/RSP/Scene Recognition")
from models.resnet import resnet50

# Load Model and Pretrained weights
path_to_weights = "pretrain_weights/rsp-aid-resnet-50-e300-ckpt.pth"
res50 = resnet50(num_classes = 51)
res50_state = torch.load(path_to_weights)
res50.load_state_dict(res50_state["model"])

# To Fintune
# Freeze everything !!!
for param in res50.parameters():
    param.requires_grad = False

# Unfreeze last layer as we want to finetune it too 
res50.fc.weight.requires_grad = True
res50.fc.bias.requires_grad = True

# Add last layer which will also have .requires_grad = True
res50.fc1 = nn.Linear(51, 43)

# some code to get load dataset and data loader
# ...

# Loop through data loader and make forward pass
for X, y in train_loader:
    yhat = res50(X)
    print(f"yhat.shape : {yhat.shape} , y.shape : {y.shape}")
    break

>>> yhat.shape : torch.Size([64, 51]) , y.shape : torch.Size([64, 43])

But if you look at all the layers in the model it looks fine,

print(res50)

>>>
...
    (2): Bottleneck(
      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Linear(in_features=2048, out_features=51, bias=True)
  (fc1): Linear(in_features=51, out_features=43, bias=True)

However if I follow this approach it seems to work.

new_model = nn.Sequential(
    res50,
    nn.Linear(51,43)
)

for X, y in train_loader:
    yhat = new_model(X)
    print(f"yhat.shape : {yhat.shape} , y.shape : {y.shape}")
    break

>>>
yhat.shape : torch.Size([64, 43]) , y.shape : torch.Size([64, 43])
print(new_model)
>>>
...
      (2): Bottleneck(
        (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
    )
    (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
    (fc): Linear(in_features=2048, out_features=51, bias=True)
  )
  (1): Linear(in_features=51, out_features=43, bias=True)
)

Any idea why the previous approach didnt work !

1

There are 1 answers

0
Karl On

You should read the basic pytorch documentation. You have to change the forward method of the model to include the use of fc1.

When you run res50.fc1 = nn.Linear(51, 43), all you're doing is assigning nn.Linear to the attribute fc1. You haven't changed the model. How do you expect the model to know what fc1 is or when it is supposed to be used?

Although for your case, what you actually want to do is replace fc entirely. The fc layer predicts classes for the original task the model was trained on. It makes no sense to feed those values into a new layer predicting new classes. You are asking the model to predict your classes by taking a weighted average of the old classes.

You need to replace the fc layer entirely and use the model's 2048 size latent representation to predict your output.

In this case you can simply assign res50.fc = nn.Linear(2048, 43). This works (as opposed to the fc1 case) because the model's forward method already uses fc.