1

My dataset looks like the following:

enter image description here

on the left, my inputs, and on the right the outputs. The inputs are tokenized and converted to a list of indices, for instance, the molecule input: 'CC1(C)Oc2ccc(cc2C@HN3CCCC3=O)C#N' is converted to:

[28, 28, 53, 69, 28, 70, 40, 2, 54, 2, 2, 2, 69, 2, 2, 54, 67, 28, 73, 33, 68, 69, 67, 28, 73, 73, 33, 68, 53, 40, 70, 39, 55, 28, 28, 28, 28, 55, 62, 40, 70, 28, 63, 39, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

I use the following list of chars as my map from strings to indices

cs = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z', 'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z', '0','1','2','3','4','5','6','7','8','9', '=','#',':','+','-','[',']','(',')','/','\'
, '@','.','%']

Thus, for every char in the input string, there is an index, and if the length of the input string is less than the max length of all inputs which is 100, I complement with zeros. (like in the above-shown example)

My model looks like this:

class LSTM_regr(torch.nn.Module) :
    def __init__(self, vocab_size, embedding_dim, hidden_dim) :
        super().__init__()
        self.embeddings = nn.Embedding(vocab_size, embedding_dim, padding_idx=0)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, batch_first=True)
        self.linear = nn.Linear(hidden_dim, 1)
        self.dropout = nn.Dropout(0.2)  
        
    def forward(self, x, l):
        x = self.embeddings(x)
        x = self.dropout(x)
        lstm_out, (ht, ct) = self.lstm(x)
        return self.linear(ht[-1])
vocab_size = 76
model =  LSTM_regr(vocab_size, 20, 256)

My problem is, after training, every input I give to the model to test it, gives me the same output (i.e., 3.3318). Why is that?

My training loop:

def train_model_regr(model, epochs=10, lr=0.001):
    parameters = filter(lambda p: p.requires_grad, model.parameters())
    optimizer = torch.optim.Adam(parameters, lr=lr)
    for i in range(epochs):
        model.train()
        sum_loss = 0.0
        total = 0
        for x, y, l in train_dl:
            x = x.long()
            y = y.float()
            y_pred = model(x, l)
            optimizer.zero_grad()
            loss = F.mse_loss(y_pred, y.unsqueeze(-1))
            loss.backward()
            optimizer.step()
            sum_loss += loss.item()*y.shape[0]
            total += y.shape[0]

EDIT:

I figured it out, I reduced the learning rate from 0.01 to 0.0005 and reduced the batch size from 100 to 10 and it worked fine.

I think this makes sense, the model was training on large batch size, thus it was learning to output the mean always since that's what the loss function does.

2 Answers 2

2

Your LSTM_regr returns the last hidden state regardless of the true sequence length. That is, if your true sequence is of length 3, x is of length 100, and the output is the last hidden state after processing 97 padding elements.

You should compute the loss for the prediction that matches the true length of each sequence.

Sign up to request clarification or add additional context in comments.

Comments

2

I figured it out, I reduced the learning rate from 0.01 to 0.0005 and reduced the batch size from 100 to 10 and it worked fine.

I think this makes sense, the model was training on large batch size, thus it was learning to output the mean always since that's what the loss function does.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.