How to get stable output for torch.nn.Transformer

Question

Looks like Transformer layers of pytorch give not reproducible outputs. It happens both for cpu and gpu. I know that it sometimes happens because of parallel computations on gpu.

emb = nn.Embedding(10, 12).to(device)
inp1 = torch.LongTensor([1, 2, 3, 4]).to(device)
inp1 = emb(inp1).reshape(inp1.shape[0], 1, 12) #S N E

encoder_layer = nn.TransformerEncoderLayer(d_model=12, nhead=4)
transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=4)

out1 = transformer_encoder(inp1)
out2 = transformer_encoder(inp1)

out1 and out2 are different. It can be multiprocessing on cpu, but results looks too shaky. How to fix this?

hkchengrex · Accepted Answer · 2020-09-01 05:30:54Z

nn.TransformerEncoderLayer has a default dropout rate of 0.1. The indices to be dropped will be randomized in every iteration when the model is in training mode.

If you want to train the model with dropout, just ignore this behavior in training and call model.eval() in testing.

If you want to disable such random behavior in training, set dropout=0 like so

nn.TransformerEncoderLayer(d_model=12, nhead=4, dropout=0)

Full testing script:

import torch
import torch.nn as nn

device = 'cpu'

emb = nn.Embedding(10, 12).to(device)
inp1 = torch.LongTensor([1, 2, 3, 4]).to(device)
inp1 = emb(inp1).reshape(inp1.shape[0], 1, 12) #S N E

encoder_layer = nn.TransformerEncoderLayer(d_model=12, nhead=4, dropout=0).to(device)
transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=4).to(device)

out1 = transformer_encoder(inp1)
out2 = transformer_encoder(inp1)

print((out1-out2).norm())

Collectives™ on Stack Overflow

How to get stable output for torch.nn.Transformer

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related