model.parameters() vs model.state_dict() - which one gives the correct number of parameters in Pytorch?

102 views Asked by At

I have created a modified version of ViT-base by coding from scratch. This version contains all the layers of the vision transformer, plus some additional layers. The number of parameters of a model can be found using this function:

def num_of_params(model):
    return sum([param.numel() for param in model.parameters()])

The built-in ViT-base model has 86859496 parameters, according to the output of num_of_params(model). But when I create my modified ViT, it shows only 24095081 as the output of num_of_params(model), though in theory, it should have more parameters than the built-in ViT-base model.

I have written another function to count the number of parameters using state_dict:

def count_from_state_dict(model):
    total = 0
    for param_tensor in model.state_dict():
        a = model.state_dict()[param_tensor].size()
        out = 1
        for i in a:
            out = out*i
        
        total += out

    return total

When I use this function, I get the correct number of parameters (93272228) for my modified ViT model. In this model, model.state_dict().keys() has only weight and bias matrices, CLS token and positional embedding weights. It doesn't have any buffer.

Why is there such inconsistencies in the number of parameters among these two methods?

When I create the ViT-base model from scratch (no additional layer), num_of_params(model) shows the correct number of parameters, i.e. 86859496. Then it also matches with the output of count_from_state_dict(model).

So I am getting a much lower value for the number of parameters in the first method only when I have added some layers to the ViT-base model.

Am I missing something?

0

There are 0 answers