5

I'm quite new to using LSTM in Pytorch, I'm trying to create a model that gets a tensor of size 42 and a sequence of 62.(so 62 tensor a of size 42 each). Which means that I have 62 tensors in a sequence. Each tensor is of size 42.(shape is [62,42]. Call this input tensor.

and I want to predict a tensor of 1 with a sequence of 8 (so size 1 tensor and 8 sequences) using this. Which means that there are 8 tensors in a sequence of size 1 each. Call this label tensor.

The connection between those tensors is this: Input tensor is made of columns: A1 A2 A3 ...... A42 While label tensor if more like: A3

What I’m trying to show is that if needed label tensor can be padded with zero in all places instead of the value of A3, so it can reach a length of 42.

How can I do this? since from what I'm reading from the Pytorch documentation I can only predict in the same ratio(1 point predict 1), while I want to predict from tensor of 42 with a sequence of 62 a tensor of 1 and sequence of 8. Is it doable? Do I need to pad the predicted tensor to size 42 from 1? Thanks!

a good solution will be using seq2seq for example

8
  • Can you clarify your question? Are you saying that each input consists of 42 sequences of length 62? And then each output would be 42 sequences of length 8? Commented Dec 29, 2019 at 19:47
  • @EvanWeissburg I’m saying that each sequence has a length of 42. I have 62 sequences each time. And I want to predict using all of this a tensor of size 8. Commented Dec 29, 2019 at 20:55
  • So you're aiming to predict 62 sequences of length 42 given a single tensor of length 8? Commented Dec 29, 2019 at 20:58
  • @EvanWeissburg The opposite, the label is this single tensor with size 8 Commented Dec 29, 2019 at 21:28
  • How exactly do you expect to map between these 62 sequences of length 42 and the single vector of length 8? Commented Dec 29, 2019 at 21:33

1 Answer 1

3

If I correctly understand your question, given a sequence of length 62 you want to predict a sequence of length 8, in the sense that the order of your outputs have an importance, this is the case if you are doing some time series forcasting). In that case using a seq2seq model will be a good choice, here is a tutorial for this link. Globbaly, you nedd to implement an encoder and a decoder, here is en exemple of such an implemtation:

class EncoderRNN(nn.Module):
    def __init__(self, input_dim=42, hidden_dim=100):
        super(EncoderRNN, self).__init__()
        self.hidden_size = hidden_size

        self.lstm = nn.LSTM(input_dim, hidden_dim)

    def forward(self, input, hidden):
        output, hidden = self.lstm(input, hidden)
        return output, hidden

    def initHidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)


class DecoderRNN(nn.Module):
    def __init__(self, hidden_dim, output_dim):
        super(DecoderRNN, self).__init__()
        self.hidden_dim = hidden_dim

        self.lstm = nn.LSTM(hidden_dim, hidden_dim)
        self.out = nn.Linear(hidden_dim, output_dim)
        self.softmax = nn.LogSoftmax(dim=1)

   def forward(self, input, hidden):
        output, hidden = self.lstm(input, hidden)
        output = self.softmax(self.out(output[0]))
        return output, hidden

   def initHidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)

If your the order of your 8 outputs has no importance, then you can simply add a Linear layer with 8 units after the LSTM layer. You can use this code directly in that case

  class Net(nn.Module):
      def __init__(self, hidden_dim=100, input_dim=42, output_size=8):
          super(Net, self).__init__()
          self.hidden_dim = hidden_dim

          self.lstm = nn.LSTM(input_dim, hidden_dim, batch_first=True)

          # The linear layer that maps from hidden state space to tag space
          self.fc = nn.Linear(hidden_dim, output_size_size)

      def forward(self, seq):
          lstm_out, _ = self.lstm(seq)
          output = self.fc(lstm_out)
          return output
Sign up to request clarification or add additional context in comments.

9 Comments

Thank you, I do need something like seq2seq but I'm having a hard time with doing it, could you please show me how would you do it using the data I wrote about?
The simplest seq2seq model you can use is an encoder-decoder architechture, the tutorial on this link give you a detailed implemtation. But globally you need to create two reccurent networks; an encoder and a decoder, I have added an example of possible implementation of each of them in my answer
This implementation is using embedding layers and more things which are not relevant for my goal, and I'm a bit stuck with using this tutorial, any chance you could help me with that? Or at least add it to your answer? I will take it as an answer if you could do it and even add bounty...
Yes, I jave added some modifications to the answer so that it can work with a time series forcasting problem, and not specifically with an NLP problem. In fact the two problem are equivalent, the only difference is that in case of NLP you need an embedding layer that transfroms every word into a vector, but in your case you already have those vectors so you don't need this embedding layer.
Note that if you want to use the code of encoder-decoder model I have written, you may need to introduce some modifications that depends on the way you are doing the training. Note that probably the first model I written well give equivalent results to the encoder-decoder architecture if you are working on time series forecasting.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.