cudnnRNNForwardTraining seqLength / xDesc usage

889 views Asked by At

Let's say I have N sequences x[i], each with length seqLength[i] for 0 <= i < N. As far as I understand from the cuDNN docs, they have to be ordered by sequence length, the longest first, so assume that seqLength[i] >= seqLength[i+1]. Let's say that they have the feature dimension D, so x[i] is a 2D tensor of shape (seqLength[i], D). As far as I understand, I should prepare a tensor x where all x[i] are contiguously behind each other, i.e. it would be of shape (sum(seqLength), D).

According to the cuDNN docs, the functions cudnnRNNForwardInference / cudnnRNNForwardTraining gets the argument int seqLength and cudnnTensorDescriptor_t* xDesc, where:

seqLength: Number of iterations to unroll over.

xDesc: Array of tensor descriptors. Each must have the same second dimension. The first dimension may decrease from element n to element n + 1 but may not increase.

I'm not exactly sure I understand this correctly. Is seqLength my max(seqLength)?

And xDesc is an array. Of what length? max(seqLength)? If so, I assume that it describes one batch of features for each frame but some of the later frames will have less sequences in it. It sounds like the number of sequences per frame is described in the first dimension. So:

xDesc[t].shape[0] = len([i for i in range(N) if t < seqLength[i]])

for all 0 <= t < max(seqLength). I.e. 0 <= xDesc[t].shape[0] <= N.

How much dimensions does each xDesc[t] describe, i.e. what is len(xDesc[t].shape)? I would assume that it is 2 and the second dimension is the feature dimension, i.e. D, i.e.:

xDesc[t].shape = (len(...), D)

The strides would have to be set accordingly, although it's also not totally clear. If x is stored in row-major order, then

xDesc[0].strides[0] = D * xDesc[0].shape[0]
xDesc[0].strides[1] = 1

But how does cuDNN compute the offset for frame t? I guess it will keep track and thus calculate sum([xDesc[t2].strides[0] for t2 in range(t)]).

Most example code I have seen assume that all sequences are of the same length. Also they all describe 3 dimensions per xDesc[t], not 2. Why is that? The third dimension is always 1, as well as the stride of the second and third dimension, and the stride for the first dimension is N. So this assumes that the tensor x is row-major ordered and of shape (max(seqLength), N, D). The code is actually a bit strange. E.g. from TensorFlow:

int dims[] = {batch_size, data_size, 1};
int strides[] = {dims[1] * dims[2], dims[2], 1};
cudnnSetTensorNdDescriptor(
    ...,
    sizeof(dims) / sizeof(dims[0]) /*nbDims*/, dims /*dimA*/,
    strides /*strideA*/);

The code looks really similar in all examples I have found. Search for cudnnSetTensorNdDescriptor or cudnnRNNForwardTraining. E.g.:

I found one example which can handle sequences of different length. Again search for cudnnSetTensorNdDescriptor:

That claims that there must be 3 dimensions for every xDesc[t]. It has the comment:

these dimensions are what CUDNN expects: (the minibatch dimension, the data dimension, and the number 1 (because each descriptor describes one frame of data)

Edit: Support for this was added now end of 2018 for PyTorch, in this commit.

Am I missing something from the cuDNN documentation? I really have not found that information in it.

My question is basically, is my conclusion about how to set the arguments x, seqLength and xDesc for cudnnRNNForwardInference / cudnnRNNForwardTraining correct, and also my implicit assumptions, or if not, how would I use it, how does the memory layout look like, etc.?

0

There are 0 answers