Is there any other reason why we make sequence length the same using padding?

161 views Asked by At

Is there any other reason why we make sequence length the same length using padding? Other than in order to do matrix multiplication (therefore doing parallel computation).

1

There are 1 answers

0
Umang Gupta On BEST ANSWER

It may depend on the specific situation you are dealing with. But in general, the only reason I would do zero padding or any kind of padding to RNN would be to make batch-wise computations work. Also, padding should be done in a way that it doesn't affect the results. So, it should not contribute to computing hidden state computation that you would be using for downstream tasks. For example, you may pad the end of the particular sequences from {t+1:T}, but then for further task or processing we should use only h{0:t}

However, if you are doing anything different than simple RNN (for eg. bidirectional-RNN), it can be complicated to do padding. For example: for the forward direction you would pad in the end and for the reverse direction, you would want to pad the front part of sequences.

Even for batching or doing parallel computations pytorch has packed sequences which should be faster than padding IMO.