I have a classification model with LSTMs to process sequential data. It trains perfectly when I enable eager mode by this command;
tf.config.experimental_run_functions_eagerly(True)
When I turned it off, the model computes the loss in the first minibatch without any problem but then gives nan loss on the second minibatch and so on.
I use Tensorflow 2.1.0 on Windows. The optimizer is Adam. The minibatch size is 48. I am sure that LSTM is the problem. Because when I change it to a dense layer, the problem goes away. I also tried GRU but got the same behavior. I don't want to put all the code here because it is too custom for my project. But I can add the related parts:
Initialization:
self.x_enc_hidden1 = LSTM(64, return_state=True)
self.x_enc_hidden2 = LSTM(64, return_sequences=True)
self.x_enc_pool = GlobalMaxPooling1D()
self.x_enc_mean = Dense(self.cfg.z_dim)
self.x_enc_var = Dense(self.cfg.z_dim)
Computation:
zx = self.x_enc_hidden2(x)
zx, h, c = self.x_enc_hidden1(zx)
h = tf.concat([h, c], axis = -1)
mean = self.x_enc_mean(h)
logvar = self.x_enc_var(h)