0

Looks like Transformer layers of pytorch give not reproducible outputs. It happens both for cpu and gpu. I know that it sometimes happens because of parallel computations on gpu.

emb = nn.Embedding(10, 12).to(device)
inp1 = torch.LongTensor([1, 2, 3, 4]).to(device)
inp1 = emb(inp1).reshape(inp1.shape[0], 1, 12) #S N E

encoder_layer = nn.TransformerEncoderLayer(d_model=12, nhead=4)
transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=4)

out1 = transformer_encoder(inp1)
out2 = transformer_encoder(inp1)

out1 and out2 are different. It can be multiprocessing on cpu, but results looks too shaky. How to fix this?

1 Answer 1

6

nn.TransformerEncoderLayer has a default dropout rate of 0.1. The indices to be dropped will be randomized in every iteration when the model is in training mode.

If you want to train the model with dropout, just ignore this behavior in training and call model.eval() in testing.

If you want to disable such random behavior in training, set dropout=0 like so

nn.TransformerEncoderLayer(d_model=12, nhead=4, dropout=0)

Full testing script:

import torch
import torch.nn as nn

device = 'cpu'

emb = nn.Embedding(10, 12).to(device)
inp1 = torch.LongTensor([1, 2, 3, 4]).to(device)
inp1 = emb(inp1).reshape(inp1.shape[0], 1, 12) #S N E

encoder_layer = nn.TransformerEncoderLayer(d_model=12, nhead=4, dropout=0).to(device)
transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=4).to(device)

out1 = transformer_encoder(inp1)
out2 = transformer_encoder(inp1)

print((out1-out2).norm())
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.