I have a model that works perfectly on a single GPU as follows:
alpha = tf.Variable(alpha,
name='ws_alpha',
trainable=False,
dtype=tf.float32,
aggregation=tf.VariableAggregation.ONLY_FIRST_REPLICA,
)
...
class CustomModel(tf.keras.Model):
@tf.function
def train_step(inputs):
...
alpha.assign_add(increment)
...
model.fit(dataset, epochs=10)
However, when I run on multiple GPUs, the assignment is not being done. It works for two training steps, and then remains the same over the whole epoch.
The alpha is for a weighted sum of two layers e.g. out = a*Layer1 + (1-a)*Layer2
. It is not a trainable parameter, but something akin to a step_count
variable.
Has anyone had experience with assigning individual values in a multi-GPU setting on tensorflow 2?
Would it be better to assign the variable as:
with tf.device("CPU:0"):
alpha = tf.Variable()
?
Simple fix, as per tensorflow issues