Variable.assign(value) on Multi-GPU with Tensorflow 2

397 views Asked by At

I have a model that works perfectly on a single GPU as follows:

alpha = tf.Variable(alpha,
                    name='ws_alpha',
                    trainable=False,
                    dtype=tf.float32,
                    aggregation=tf.VariableAggregation.ONLY_FIRST_REPLICA,
                   )

...
class CustomModel(tf.keras.Model):


    @tf.function
    def train_step(inputs):
        ...
        alpha.assign_add(increment)

...


model.fit(dataset, epochs=10)

However, when I run on multiple GPUs, the assignment is not being done. It works for two training steps, and then remains the same over the whole epoch.

The alpha is for a weighted sum of two layers e.g. out = a*Layer1 + (1-a)*Layer2. It is not a trainable parameter, but something akin to a step_count variable.

Has anyone had experience with assigning individual values in a multi-GPU setting on tensorflow 2?

Would it be better to assign the variable as:

with tf.device("CPU:0"):
    alpha = tf.Variable()

?

1

There are 1 answers

0
Simon Thomas On

Simple fix, as per tensorflow issues

alpha = tf.Variable(alpha,
                    name='ws_alpha',
                    trainable=False,
                    dtype=tf.float32,
                    aggregation=tf.VariableAggregation.ONLY_FIRST_REPLICA,
                    synchronization=tf.VariableSynchronization.ON_READ,
                   )