This release of PyTorch seems provide the PackedSequence for variable lengths of input for recurrent neural network. However, I found it's a bit hard to use it correctly.
Using pad_packed_sequence to recover an output of a RNN layer which were fed by pack_padded_sequence, we got a T x B x N tensor outputs where T is the max time steps, B is the batch size and N is the hidden size. I found that for short sequences in the batch, the subsequent output will be all zeros.
Here are my questions.
- For a single output task where the one would need the last output of all the sequences, simple
outputs[-1]will give a wrong result since this tensor contains lots of zeros for short sequences. One will need to construct indices by sequence lengths to fetch the individual last output for all the sequences. Is there more simple way to do that? - For a multiple output task (e.g. seq2seq), usually one will add a linear layer
N x Oand reshape the batch outputsT x B x OintoTB x Oand compute the cross entropy loss with the true targetsTB(usually integers in language model). In this situation, do these zeros in batch output matters?