Dynamic tensor shape for tensorflow RNN

1.8k views Asked by At

I'm trying a very simple example for tensorflow RNN. In that example, I use dynamic rnn. The code is as follows:

data = tf.placeholder(tf.float32, [None, 10,1]) #Number of examples, number of input, dimension of each input
target = tf.placeholder(tf.float32, [None, 11])
num_hidden = 24
cell = tf.nn.rnn_cell.LSTMCell(num_hidden,state_is_tuple=True)
val, _ = tf.nn.dynamic_rnn(cell, data, dtype=tf.float32)
val = tf.transpose(val, [1, 0, 2])
last = tf.gather(val, int(val.get_shape()[0]) - 1)
weight = tf.Variable(tf.truncated_normal([num_hidden, int(target.get_shape()[1])]))
bias = tf.Variable(tf.constant(0.1, shape=[target.get_shape()[1]]))
prediction = tf.nn.softmax(tf.matmul(last, weight) + bias)
cross_entropy = -tf.reduce_sum(target * tf.log(tf.clip_by_value(prediction,1e-10,1.0)))
optimizer = tf.train.AdamOptimizer()
minimize = optimizer.minimize(cross_entropy)
mistakes = tf.not_equal(tf.argmax(target, 1), tf.argmax(prediction, 1))
error = tf.reduce_mean(tf.cast(mistakes, tf.float32))

Actually, the code is taken from this tutorial. The input to this RNN network is a sequence of binary numbers. Each number is put into an array. For example, a seuquence has format:
[[1],[0],[0],[1],[1],[0],[1],[1],[1],[0]] The shape of the input is [None,10,1] which are batch size, sequence size and embedding size, respectively. Now because dynamic rnn can accept variable input shape, I change the code as follows:

data = tf.placeholder(tf.float32, [None, None,1])

Basically, I want to use variable-length sequences (of course same length for all sequences in the same batch, but different between batches). However, it throws the error:

Traceback (most recent call last):
  File "rnn-lstm-variable-length.py", line 48, in <module>
    last = tf.gather(val, int(val.get_shape()[0]) - 1)
TypeError: __int__ returned non-int (type NoneType)

I understand that the second dimension is None, which cannot be used in get_shape()[0]. However, I believe that there must be a way to overcome this because RNN accepts variable lenth inputs, in general. How can I do it?

2

There are 2 answers

0
Mark McDonald On BEST ANSWER

tf.gather expects a tensor, so you can use tf.shape(val) to get a tensor, calculated at run-time, for the shape of val - e.g. tf.gather(val, tf.shape(val)[0] - 1)

4
Mark McDonald On

tl;dr: try using tf.batch(..., dynamic_pad=True) to batch your data.


@chris_anderson's comment is correct. Ultimately your network needs a dense matrix of numbers to work with and there are a couple of strategies to convert variable length data into hyperrectangles:

  1. Pad all batches to a fixed size (e.g. assume a maximum length of say 500 items per input and every item in every batch is padded to 500). There is nothing dynamic about this strategy.
  2. Apply padding per-batch to the length of the longest item in the batch (dynamic padding).
  3. Bucket your input based on length and apply padding per-batch. This is the same as #2, but with less overall padding.

There are other strategies that you could use too.

To do this batching, you use:

  1. tf.train.batch - by default it does no padding, you need to implement it yourself.
  2. tf.train.batch(..., dynamic_pad=True)
  3. tf.contrib.training.bucket_by_sequence_length

I suspect you're also confused by the use of tf.nn.dynamic_rnn. It's important to note that the dynamic in dynamic_rnn refers to the way that TensorFlow unrolls the recurrent part of the network. in tf.nn.rnn, the recurrence is done statically in the graph (there is no internal loop, it's unrolled at graph construction time). In dynamic_rnn however, TensorFlow uses tf.while_loop to iterate inside the graph at run time. To use dynamic padding, you need to use dynamic unrolling, but it does not do it automatically.