0

I am developing a sequence-to-sequence model (paper) for text generation. I am not using 'teacher-forcing" at the decoder side, i.e. output of decoder at t0 is fed to input of decoder at time t1.

Now, in reality, output of a decoder (LSTM/GRU) is passed through a Dense layer, which in tern generates the index of the word, which is considered as the output of the decoder.

But, for feeding the output to next layer, should we feed the h_t ( i.e. output of decoder/ hidden state of the decoder) to the next step, or the word-embedding of next word is the correct choice ?

1 Answer 1

1

The short answer is: probably both, but the hidden state h_t is essential.

Feeding the hidden state h_t is required to pass information about the entire sentence (not just the previous word) from one decoder layer to the next.

Feeding the embedding of the chosen word is not essential, but it is probably a good idea. This allows the decoder to condition on the previous choices it was forced to make.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.