Looks like Transformer layers of pytorch give not reproducible outputs. It happens both for cpu and gpu. I know that it sometimes happens because of parallel computations on gpu.
emb = nn.Embedding(10, 12).to(device)
inp1 = torch.LongTensor([1, 2, 3, 4]).to(device)
inp1 = emb(inp1).reshape(inp1.shape[0], 1, 12) #S N E
encoder_layer = nn.TransformerEncoderLayer(d_model=12, nhead=4)
transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=4)
out1 = transformer_encoder(inp1)
out2 = transformer_encoder(inp1)
out1 and out2 are different. It can be multiprocessing on cpu, but results looks too shaky. How to fix this?