I had a working LSTM model that had one weight/bias layer from its recurrent state to the output. I then also coded up the same system, but with two layers. This means that I would have the LSTM, then a hidden layer, and then the output. I wrote the lines to define this double layer model, but did not use them a single time. But, now that those layers exist but are not used at all, it wouldn't learn! So my weights and biases are defined like so:
weights = {
# if going straight from PLSTM output to x,y prediction
'out': tf.Variable(tf.random_normal([FLAGS.n_hidden, n_out], stddev=1/FLAGS.n_hidden, dtype=tf.float32)),
# if fully connected feed-forward hidden layer between PLSTM output and x,y prediction
'outHidden1': tf.Variable(tf.random_normal([FLAGS.n_hidden, FLAGS.n_middle], dtype=tf.float32)),
'outHidden2': tf.Variable(tf.random_normal([FLAGS.n_middle, n_out], dtype=tf.float32))
}
biases = {
# if going straight from PLSTM output to x,y prediction
'out': tf.Variable(tf.random_normal([n_out], dtype=tf.float32)),
# if fully connected feed-forward hidden layer between PLSTM output and x,y predictio
'outHidden1': tf.Variable(tf.random_normal([FLAGS.n_middle], dtype=tf.float32)),
'outHidden2': tf.Variable(tf.random_normal([n_out], dtype=tf.float32))
}
So I had the double layer weights and biases defined, but they were never used a single time in training or testing.
My incorporation of the weights/biases was a single line:
return tf.matmul(relevant, weights['out']) + biases['out']
where relevant is the LSTM output. So I am only using the 'out' variables in the weight and bias dictionaries.
It wouldn't learn anything. Then, once I commented the double layer variables out like this:
weights = {
# if going straight from PLSTM output to x,y prediction
'out': tf.Variable(tf.random_normal([FLAGS.n_hidden, n_out], stddev=1/FLAGS.n_hidden, dtype=tf.float32)),
# if fully connected feed-forward hidden layer between PLSTM output and x,y prediction
# 'outHidden1': tf.Variable(tf.random_normal([FLAGS.n_hidden, FLAGS.n_middle], dtype=tf.float32)),
# 'outHidden2': tf.Variable(tf.random_normal([FLAGS.n_middle, n_out], dtype=tf.float32))
}
biases = {
# if going straight from PLSTM output to x,y prediction
'out': tf.Variable(tf.random_normal([n_out], dtype=tf.float32)),
# if fully connected feed-forward hidden layer between PLSTM output and x,y predictio
# 'outHidden1': tf.Variable(tf.random_normal([FLAGS.n_middle], dtype=tf.float32)),
# 'outHidden2': tf.Variable(tf.random_normal([n_out], dtype=tf.float32))
}
...it started working again. How does the existence of those variables impede learning? I initialize them, but no gradients should be running through them, and backprop shouldn't have any association with those unused variables. Or am I misunderstanding something?