How can I replace a variable with another one in Tensorflow's computation graph?

456 views Asked by At

Problem: I have two pretrained models with variables W1,b1 and W2,b2 saved as numpy arrays.

I want to set a mixture of these two pretrained models as the variables of my model, and only update the mixture weights alpha1 and alpha2 during training.

In order to do that I create two variables alpha1 and alpha2 and load the numpy arrays and create the mixture nodes: W_new, b_new.

I want to replace W and b in the computation graph with W_new and b_new and then only train alpha1 and alpha2 parameter by opt.minimize(loss, var_list= [alpha1, alpha2]).

I don't know how to replace W_new and b_new in the computation graph. I tried assigning tf.trainable_variables()[0] = W_new but this doesn't work.

I'd appreciate if anyone could give me some clues.

Note 1: I don't want to assign values to W and b (this will disconnect the graph from alpha1 and alpha2), I want the mixture of parameters to be a part of the graph.

Note 2: You might say that you could compute y using the new variables, but problem is, the code here is just a toy sample to simplify things. In reality instead of linear regression I have several bilstms with crf. So I can't manually compute the formula. I'll have to replace these variables in the graph.

import tensorflow as tf
import numpy as np
np.random.seed(7)
tf.set_random_seed(7)

#define a linear regression model with 10 params and 1 bias
with tf.variable_scope('main'):
    X = tf.placeholder(name='input', dtype=float)
    y_gold = tf.placeholder(name='output', dtype=float)
    W = tf.get_variable('W', shape=(10, 1))
    b = tf.get_variable('b', shape=(1,))
    y = tf.matmul(X, W) + b
    #loss = tf.losses.mean_squared_error(y_gold, y)


#numpy matrices saved from two different trained models with the exact same architecture
W1 = np.random.rand(10, 1)
W2 = np.random.rand(10, 1)
b1 = np.random.rand(1)
b2 = np.random.rand(1)

with tf.variable_scope('mixture'):
    alpha1 = tf.get_variable('alpha1', shape=(1,))
    alpha2 = tf.get_variable('alpha2', shape=(1,))

    W_new = alpha1 * W1 + alpha2 * W2
    b_new = alpha1 * b1 + alpha2 * b2

all_trainable_vars = tf.trainable_variables()
print(all_trainable_vars)


#replace the original W and b with the new mixture variables in the computation graph (**doesn't do what I want**)
all_trainable_vars[0] = W_new
all_trainable_vars[1] = b_new
#this doesn't work

#note that I could just do the computation for y using the new variables as y = tf.matmul(X, W_new) + b_new
#but the problem is this is just a toy example. In real world, my model has a big architecture with several
#bilstms whose variables I want to replace with these new ones.

#Now what I need is to replace W and b trainable parameters (items 0 and 1 in all_trainable vars)
#with W_new and b_new in the computation graph.

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    train_writer = tf.summary.FileWriter('./' + 'graph',
                                         sess.graph)
    #print(sess.run([W, b]))
    #give the model 3 samples and predict on them
    print(sess.run(y, feed_dict={X:np.random.rand(3, 10)}))

Why do I want to do this?

Assume you have several pretrained models (in different domains) but you don't have access to any of their data.

Then you have a little bit of training data from another domain that doesn't give you that much performance, but if you could train the model jointly with the data that you don't have you could get a good performance.

Assuming the data is somehow represented in the trained models, we want to learn a mixture of the pretrained models, by learning the mixing coefficients, using little labelled data that we have as supervision.

We don't want to pretrain any parameters, we only want to learn a mix of pretrained models. What are the mixture weights? we need to learn that from the little supervision that we have.

Update 1:

I realised I could set the parameters of the model before I create it as:

model = Model(W_new, b_new)

But as I said my real model uses several tf.contrib.rnn.LSTMCell objects. So I'll need to give the LSTMCell class and the new variables instead of letting it to create its own new variables. So now the problem is how to set the variables of LSTMCell instead of letting it create them. I guess I'll need to subclass the LSTMCell class and make the changes. Is there any easy way to do this, is my question now. Maybe I should ask this as a new question.

What I want to do:

W = tf.get_variable(...)
b = tf.get_variable(...)
cell_fw = tf.contrib.rnn.LSTMCell(W, b,
                        state_is_tuple=True)

created a separate question for this here because it might be useful for others for different reasons.

0

There are 0 answers