Why params are non-trainable in summary of transfer model even if I freeze the weights of kernel and recurrent_kernel of first layer of Keras Model?

85 views Asked by At

I am trying to build a transfer model using the standard Keras library. I am freezing the weights of the kernel and recurrent_kernel of the first layer of the model by following code...

#freeze weights
for weight in modelTL.layers[0].weights:
    if 'kernel' in weight.name:
        print(weight.name)
        weight._trainable = False
        print(f"weight name: {weight.name}, Trainable: {weight.trainable}") 
    if 'recurrent_kernel' in weight.name:
        print(weight.name)
        weight._trainable = False
        print(f"weight name: {weight.name}, Trainable: {weight.trainable}") 
    elif 'bias' in weight.name:
        print(weight.name)
        weight._trainable = True
        print(f"weight name: {weight.name}, Trainable: {weight.trainable}") 

But when I check the summary of the Transfer Model, I see there are no non-trainable params.


 Layer (type)                Output Shape              Param #   
=================================================================
 trainable_lstm_62 (LSTM)    (None, 4, 10)             760       
                                                                 
 trainable_lstm_63 (LSTM)    (None, 8)                 608       
                                                                 
 trainable_dense_31 (Dense)  (None, 4)                 36        
                                                                 
=================================================================
Total params: 1,404
Trainable params: 1,404
Non-trainable params: 0
_________________________________________________________________

Also, when I print the weights of the kernel and recurrent_kerenl of the first layers of the Base and Transfer model, I see they are changed, though I have frozen them. Can anyone help me how to freeze specific weights with a specific layer of the LSTM model in standard KERAS?

Here is my full code.

import random
import numpy as np
import tensorflow as tf
import keras
from keras.models import Sequential
from keras.layers import Dense, LSTM
from keras.models import load_model

# set random seeds
np.random.seed(1)
tf.random.set_seed(1)
random.seed(1)

'base model'
# Data
train_x = np.random.rand(100, 4, 8)
train_y = np.random.rand(100, 4)
test_x = np.random.rand(30, 4, 8)
test_y = np.random.rand(30, 4)

# LSTM model
model = Sequential()
model.add(LSTM(units=10, activation='relu', input_shape=(train_x.shape[1], train_x.shape[2]), return_sequences=True, use_bias=True))
model.add(LSTM(units=8, activation='relu'))
model.add(Dense(activation='linear', units=4, use_bias=True))
model.compile(loss='mse', optimizer='Nadam', metrics=['mse','mae'] ) 

# fit model
history=model.fit(train_x, train_y, validation_data=(test_x, test_y), epochs=3, verbose=1, batch_size=64)
model.save("source_model.h5")

model.summary()

'check Base model weights'
kernelB, recurrent_kernelB, biasB = model.layers[0].weights
print ('kernelB', kernelB)
print ('recurrent_kernelB', recurrent_kernelB)
print ('biasB', biasB)


'transfer model'
# Data
train_x1 = np.random.rand(200, 4, 8)
train_y1 = np.random.rand(200, 4)
test_x1 = np.random.rand(60, 4, 8)
test_y1 = np.random.rand(60, 4)

# Load the pre-trained model
pretrained_model = load_model('source_model.h5')
pretrained_model.layers

# 'Initialize' new Sequential model
modelTL = Sequential()

# extract all the layers from the pre-trained model with unique names
for layer in pretrained_model.layers:
    new_layer = layer.__class__.from_config(layer.get_config())
    new_layer._name = f'trainable_{layer.name}'
    modelTL.add(new_layer)
    
    
#freeze weights
for weight in modelTL.layers[0].weights:
    if 'kernel' in weight.name:
        print(weight.name)
        weight._trainable = False
        print(f"weight name: {weight.name}, Trainable: {weight.trainable}") 
    if 'recurrent_kernel' in weight.name:
        print(weight.name)
        weight._trainable = False
        print(f"weight name: {weight.name}, Trainable: {weight.trainable}") 
    elif 'bias' in weight.name:
        print(weight.name)
        weight._trainable = True
        print(f"weight name: {weight.name}, Trainable: {weight.trainable}") 


# Compile the model
modelTL.compile(loss='mse', optimizer='Nadam', metrics=['mse', 'mae'])

# Fit the model
historyTL = modelTL.fit(train_x1, train_y1, validation_data=(test_x1, test_y1), epochs=3, verbose=1, batch_size=64)

modelTL.summary()

# Save the transfer model
modelTL.save("TL_model.h5")

'check Transfer model weights'
kernelTL, recurrent_kernelTL, biasTL = modelTL.layers[0].weights
print ('kernelTL', kernelTL)
print ('recurrent_kernelTL', recurrent_kernelTL)
print ('biasTL', biasTL)

Please note that I don't want to freeze an entire layer. I did it using modelTL.layer[0].trainable = False and it gave me non-trainable params. Now, I want to freeze a specific weight type [either kernel (W) or recurrent_kernel (U)] of a layer.

Edits :

I have solved the problem partially.

Previously, when I was renaming the layers, I was copying only the configuration from the base model, not the weights (shown below). So, the weights of the models were randomly set. That's why I was getting different weights from the base model when printing the frozen weights of the transfer model.

# extract all the layers from the pre-trained model with unique names
for layer in pretrained_model.layers:
    new_layer = layer.__class__.from_config(layer.get_config())
    new_layer._name = f'trainable_{layer.name}'
    modelTL.add(new_layer)

I modified the code as below. Now, I get the same weights between the base and transfer models.

import random
import numpy as np
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense, LSTM
from keras.models import load_model

# set random seeds
np.random.seed(1)
tf.random.set_seed(1)
random.seed(1)

'base model'
# Data
train_x = np.random.rand(100, 4, 8)
train_y = np.random.rand(100, 4)
test_x = np.random.rand(30, 4, 8)
test_y = np.random.rand(30, 4)

# LSTM model
model = Sequential()
model.add(LSTM(units=10, activation='relu', input_shape=(train_x.shape[1], train_x.shape[2]), return_sequences=True))
model.add(LSTM(units=8, activation='relu'))
model.add(Dense(units=4, activation='linear'))

# Compile the model
model.compile(loss='mse', optimizer='Nadam', metrics=['mae'] ) 

# fit model
history=model.fit(train_x, train_y, validation_data=(test_x, test_y), epochs=3, verbose=1, batch_size=64)

model.summary()
model.save("source_model.h5")


'transfer model'
# Data
train_x1 = np.random.rand(200, 4, 8)
train_y1 = np.random.rand(200, 4)
test_x1 = np.random.rand(60, 4, 8)
test_y1 = np.random.rand(60, 4)

# Load the pre-trained model
base_model = load_model('source_model.h5')

modelTL=base_model

#rename layers name of the modelTL
for layer in modelTL.layers:
    layer._name = f'trainable_{layer.name}'
    print (layer.name)
    
# Freeze weights (kernel) for LSTM layers
for layer in modelTL.layers[0:3]:
    for weight in layer.weights:
        if 'kernel' in weight.name and 'recurrent_kernel' not in weight.name:
            weight._trainable = False
            print(f"weight name: {weight.name}, Trainable: {weight.trainable}")            
            
# Freeze recurrent weights (recurrent kernel) for LSTM layers
for layer in modelTL.layers[0:2]:
    for weight in layer.weights:
        if 'recurrent_kernel' in weight.name:
            weight._trainable = False
            print(f"weight name: {weight.name}, Trainable: {weight.trainable}") 
            
# Compile the model
modelTL.compile(loss='mse', optimizer='Nadam', metrics=['mae'])

# Fit the model
historyTL = modelTL.fit(train_x1, train_y1, validation_data=(test_x1, test_y1), epochs=3, verbose=1, batch_size=64)

modelTL.summary()
modelTL.save("TL_model.h5")

'check'
base_model_weights_1st = model.layers[0].get_weights()
transfer_model_weights_1st = modelTL.layers[0].get_weights()

base_model_weights_2nd = model.layers[1].get_weights()
transfer_model_weights_2nd = modelTL.layers[1].get_weights()

base_model_weights_3rd = model.layers[2].get_weights()
transfer_model_weights_3rd = modelTL.layers[2].get_weights()

'weight check'
if np.array_equal(base_model_weights_1st[0], transfer_model_weights_1st[0]):
    print("Weights in the first LSTM layer of the base model and transfer model are the same.")
else:
    print("Weights in the first LSTM layer of the base model and transfer model are different.")

if np.array_equal(base_model_weights_2nd[0], transfer_model_weights_2nd[0]):
    print("Weights in the second LSTM layer of the base model and transfer model are the same.")
else:
    print("Weights in the second LSTM layer of the base model and transfer model are different.")

if np.array_equal(base_model_weights_3rd[0], transfer_model_weights_3rd[0]):
    print("Weights in the third Dense layer of the base model and transfer model are the same.")
else:
    print("Weights in the third Dense layer of the base model and transfer model are different.")
    
'recurrent weight check'
if np.array_equal(base_model_weights_1st[1], transfer_model_weights_1st[1]):
    print("Recurrent weights in the first LSTM layer of the base model and transfer model are the same.")
else:
    print("Recurrent weights in the first LSTM layer of the base model and transfer model are different.")

if np.array_equal(base_model_weights_2nd[1], transfer_model_weights_2nd[1]):
    print("Recurrent weights in the second LSTM layer of the base model and transfer model are the same.")
else:
    print("Recurrent weights in the second LSTM layer of the base model and transfer model are different.")

'bias check'
if np.array_equal(base_model_weights_1st[2], transfer_model_weights_1st[2]):
    print("Biases in the first LSTM layer of the base model and transfer model are the same.")
else:
    print("Biases in the first LSTM layer of the base model and transfer model are different.")

if np.array_equal(base_model_weights_2nd[2], transfer_model_weights_2nd[2]):
    print("Biases in the second LSTM layer of the base model and transfer model are the same.")
else:
    print("Biases in the second LSTM layer of the base model and transfer model are different.")

if np.array_equal(base_model_weights_3rd[1], transfer_model_weights_3rd[1]):
    print("Biases in the third Dense layer of the base model and transfer model are the same.")
else:
    print("Biases in the third Dense layer of the base model and transfer model are different.")

Now, I get the same weights between the base and transfer models.

Weights in the first LSTM layer of the base model and transfer model are the same.
Weights in the second LSTM layer of the base model and transfer model are the same.
Weights in the third Dense layer of the base model and transfer model are the same.
Recurrent weights in the first LSTM layer of the base model and transfer model are the same.
Recurrent weights in the second LSTM layer of the base model and transfer model are the same.
Biases in the first LSTM layer of the base model and transfer model are different.
Biases in the second LSTM layer of the base model and transfer model are different.
Biases in the third Dense layer of the base model and transfer model are different.

However, the problem is the summary of the transfer model doesn't show non-transferable params. However, since when printing the weights, I am getting same weights between the base and transfer models, I am ignoring this issue. I personally think, the model.summary() function works layer-wise. So, it can't dig inside the separate weight types within a layer.

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 trainable_lstm (LSTM)       (None, 4, 10)             760       
                                                                 
 trainable_lstm_1 (LSTM)     (None, 8)                 608       
                                                                 
 trainable_dense (Dense)     (None, 4)                 36        
                                                                 
=================================================================
Total params: 1,404
Trainable params: 1,404
Non-trainable params: 0
0

There are 0 answers