Multivariate multi-step time forecasting bad prediction results PyTorch LSTM Seq2Seq

Question

I am trying to build an LSTM based Seq2Seq model in PyTorch for multivariate multistep prediction.

The data used is shown in the figure above, where the last column is the target, and all the front columns are features. For preprocessing, I use MaxMinScaler to scale all data between -1 and 1.

Features and Target

Then I used an Encoder-Decoder structure.

class Seq2Seq(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, output_size, batch_size):
    super().__init__()
    self.output_size = output_size
    self.Encoder = Encoder(input_size, hidden_size, num_layers, batch_size)
    self.Decoder = Decoder(input_size, hidden_size,
                           num_layers, output_size, batch_size)

def forward(self, input_seq):
    batch_size, seq_len, _ = input_seq.shape[0], input_seq.shape[1], input_seq.shape[2]
    h, c = self.Encoder(input_seq)
    outputs = torch.zeros(batch_size, seq_len, self.output_size).to(device)
    for t in range(seq_len):
        _input = input_seq[:, t, :]
        # print(_input.shape)
        output, h, c = self.Decoder(_input, h, c)
        outputs[:, t, :] = output

    return outputs[:, -1, :]

The Traning

def seq2seq_train(model, Dtr, Val, path):
    model = model
    loss_function = nn.MSELoss().to(device)
    # loss_function = nn.L1Loss().to(device)
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001, 
    weight_decay=1e-4)

After 100 epochs of training, the obtained losses and test results are as follows.

Loss History

Test Result

The validation loss doesn't seem to drop, and the prediction seems bad.

Then I used Optuna to optimize hyperparameters, including different number of hidden layer nodes, LSTM layers, dropout, etc., but the results are not good, all have high validation loss.

I would like to know what caused this result, is it a problem with the data, the model structure or the hyperparameters?

I hope to get help, thank you very much.

Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. Define the problem and the expected outcome. Also provide a description of the data and clearly define what you are doing to it (augmentation/preprocessing). — joe hoeller
– joe hoeller, Commented Oct 4, 2022 at 16:14
@joehoeller Thank you for your advice. I have edicted my question. I hope you can understand the problem and help me. Thank you. — C H
– C H, Commented Oct 6, 2022 at 11:47
You did not define the problem you are trying to solve. Please re-read what I wrote, more info is required to solve for this. — joe hoeller
– joe hoeller, Commented Oct 6, 2022 at 13:49

joe hoeller · Accepted Answer · 2022-10-06 13:53:46Z

Tentative answer based on info provided:

Note that when one uses cross-entropy loss for classification as it is usually done, bad predictions are penalized much more strongly than good predictions are rewarded. For a cat image, the loss is log(1−prediction), so even if many cat images are correctly predicted (low loss), a single misclassified cat image will have a high loss, hence "blowing up" your mean loss. See this answer for further illustration of this phenomenon. (Getting increasing loss and stable accuracy could also be caused by good predictions being classified a little worse, but I find it less likely because of this loss "asymmetry").

So I think that when both accuracy and loss are increasing, the network is starting to overfit, and both phenomena are happening at the same time. The network is starting to learn patterns only relevant for the training set and not great for generalization, leading to said phenomenon, some images from the validation set get predicted really wrong, with an effect amplified by the "loss asymmetry". However, it is at the same time still learning some patterns which are useful for generalization (phenomenon one, "good learning") as more and more images are being correctly classified.

There is also a great explanation in this Tweet that concisely explains why you may encounter validation loss being lower than training loss.

Collectives™ on Stack Overflow

Multivariate multi-step time forecasting bad prediction results PyTorch LSTM Seq2Seq

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related