I am developing a sequence-to-sequence model (paper) for text generation. I am not using 'teacher-forcing" at the decoder side, i.e. output of decoder at t0 is fed to input of decoder at time t1.
Now, in reality, output of a decoder (LSTM/GRU) is passed through a Dense layer, which in tern generates the index of the word, which is considered as the output of the decoder.
But, for feeding the output to next layer, should we feed the h_t ( i.e. output of decoder/ hidden state of the decoder) to the next step, or the word-embedding of next word is the correct choice ?