What should the input be to the "Decoder" of "Sequence-to-Sequence" model?

Question

I am developing a sequence-to-sequence model (paper) for text generation. I am not using 'teacher-forcing" at the decoder side, i.e. output of decoder at t0 is fed to input of decoder at time t1.

Now, in reality, output of a decoder (LSTM/GRU) is passed through a Dense layer, which in tern generates the index of the word, which is considered as the output of the decoder.

But, for feeding the output to next layer, should we feed the h_t ( i.e. output of decoder/ hidden state of the decoder) to the next step, or the word-embedding of next word is the correct choice ?

myrtlecat · Accepted Answer · 2018-04-02 13:01:26Z

1

The short answer is: probably both, but the hidden state h_t is essential.

Feeding the hidden state h_t is required to pass information about the entire sentence (not just the previous word) from one decoder layer to the next.

Feeding the embedding of the chosen word is not essential, but it is probably a good idea. This allows the decoder to condition on the previous choices it was forced to make.

answered Apr 2, 2018 at 13:01

myrtlecat

2,27615 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

What should the input be to the "Decoder" of "Sequence-to-Sequence" model?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related