I'm currently trying to learn Sonnet
.
My network (incomplete, the question is based on this):
class Model(snt.AbstractModule):
def __init__(self, name="LSTMNetwork"):
super(Model, self).__init__(name=name)
with self._enter_variable_scope():
self.l1 = snt.LSTM(100)
self.l2 = snt.LSTM(100)
self.out = snt.LSTM(10)
def _build(self, inputs):
# 'inputs' is of shape (batch_size, input_length)
# I need it to be of shape (batch_size, sequence_length, input_length)
l1_state = self.l1.initialize_state(np.shape(inputs)[0]) # init with batch_size
l2_state = self.l2.initialize_state(np.shape(inputs)[0]) # init with batch_size
out_state = self.out.initialize_state(np.shape(inputs)[0])
l1_out, l1_state = self.l1(inputs, l1_state)
l1_out = tf.tanh(l1_out)
l2_out, l2_state = self.l2(l1_out, l2_state)
l2_out = tf.tanh(l2_out)
output, out_state = self.out(l2_out, out_state)
output = tf.sigmoid(output)
return output, out_state
In other frameworks (eg. Keras), LSTM inputs are of the form (batch_size, sequence_length, input_length)
.
However, the Sonnet documentation states that the input to Sonnet's LSTM is of the form (batch_size, input_length)
.
How do I use them for sequential input?
So far, I've tried using a for loop inside _build
, iterating over each timestep, but that gives seemingly random outputs.
I've tried the same architecture in Keras, which runs without any issues.
I'm executing in eager mode, using GradientTape
for training.
We generally wrote the RNNs in Sonnet to work on a single timestep basis, as for Reinforcement Learning you often need to run one timestep to pick an action, and without that action you can't get the next observation (and the next input timestep) from the environment. It's easy to unroll a single timestep module over a sequence using
tf.nn.dynamic_rnn
(see below). We also have a wrapper which takes care of composing several RNN cores per timestep, which I believe is what you're looking to do. This has the advantage that theDeepCore
object supports the start state methods required fordynamic_rnn
, so it's API compatibe with LSTM or any other single-timestep module.What you want to do should be achievable like this:
A few things to note - if you haven't already please have a look at the RNN example in the repository, as this shows a full graph mode training procedure setup around a fairly similar model.
Secondly, if you do end up needing to implement a more complex module that
DeepRNN
allows for, it's important to thread the recurrent state in and out of the module. In your example you're making the input state internally, andl1_state
andl2_state
as output are effectively discarded, so this can't be properly trained. If DeepRNN wasn't available, your model would look like this:Finally, if you're using eager mode I would strongly encourage you to have a look at Sonnet 2 - it's a complete rewrite for TF 2 / Eager mode. It's not backwards compatible, but all the same kinds of module compositions are possible. Sonnet 1 was written primarily for Graph mode TF, and while it does work with Eager mode you'll probably encounter some things that aren't very convenient.
We worked closely with the TensorFlow team to make sure that TF 2 & Sonnet 2 work nicely together, so please have a look: (https://github.com/deepmind/sonnet/tree/v2). Sonnet 2 should be considered alpha, and is being actively developed, so we don't have loads of examples yet, but more will be added in the near future.