I have created a DNN model with Pytorch (input_dim=6, output_dim=150). Normally, if I generate a random X_in=torch.randn(6000, 6), it will return me a model_out.shape=(6000, 150), and if I calculate the Rank of model_out, it should be 150 (since my model's weight and bias are also randomly initialised).
However, you can see this is NOT TRUE with the following code:
import torch
import torch.nn as nn
torch.manual_seed(923) # for reproducible result
class MyDNN(nn.Module):
def __init__(self):
super(MyDNN, self).__init__()
# layer 0:
self.linear_0 = nn.Linear(6, 150)
self.activ_0 = nn.Tanh()
# layer 1:
self.linear_1 = nn.Linear(150, 150)
self.activ_1 = nn.Tanh()
# layer 2:
self.linear_2 = nn.Linear(150, 150)
self.activ_2 = nn.Tanh()
# layer 3:
self.linear_3 = nn.Linear(150, 150)
self.activ_3 = nn.Tanh()
def forward(self, x):
out = self.activ_0(self.linear_0(x)) # output: layer 0
out = self.activ_1(self.linear_1(out)) # output: layer 1
out = self.activ_2(self.linear_2(out)) # output: layer 2
out = self.activ_3(self.linear_3(out)) # output: layer 3
return out
model = MyDNN()
X_in = torch.randn(6000, 6, dtype=torch.float32)
with torch.no_grad():
model_out = model(X_in)
print(f'model_out rank = {torch.linalg.matrix_rank(model_out)}')
model_out rank = 115. Apparently this is a WRONG output, there is no way that the output has so many linear dependent columns with all the inputs, weights and bias are randomly initialised!
This problem can be solved by changing the X_in dtype as well as the model dtype to float64 with the following code:
model_64 = MyDNN()
model_64.double()
X_in_64 = torch.randn(6000, 6, dtype=torch.float64)
with torch.no_grad():
model_64_out = model_64(X_in_64)
print(f'model_64_out rank = {torch.linalg.matrix_rank(model_64_out)}')
model_64_out rank = 150
Here is my question:
- Why does this happen? Is this really a problem of data size? I mean float32 already has a good precision. Actually when I use my own training_data, even with mini_batch_size = 10 -> output.shape = (10, 150), my Rank(output) is less than 10.
- Although this problem can be solved by using double precision, this slows down the whole training process a lot (and with Mac M1 pro GPU, it only supports float32 type). Is there any other solution?