2

I have created a DNN model with Pytorch (input_dim=6, output_dim=150). Normally, if I generate a random X_in=torch.randn(6000, 6), it will return me a model_out.shape=(6000, 150), and if I calculate the Rank of model_out, it should be 150 (since my model's weight and bias are also randomly initialised).

However, you can see this is NOT TRUE with the following code:

import torch
import torch.nn as nn

torch.manual_seed(923) # for reproducible result

class MyDNN(nn.Module):
    def __init__(self):
        super(MyDNN, self).__init__()
        # layer 0:
        self.linear_0 = nn.Linear(6, 150)
        self.activ_0 = nn.Tanh()
        # layer 1:
        self.linear_1 = nn.Linear(150, 150)
        self.activ_1 = nn.Tanh()
        # layer 2:
        self.linear_2 = nn.Linear(150, 150)
        self.activ_2 = nn.Tanh()
        # layer 3:
        self.linear_3 = nn.Linear(150, 150)
        self.activ_3 = nn.Tanh()

    def forward(self, x):
        out = self.activ_0(self.linear_0(x)) # output: layer 0
        out = self.activ_1(self.linear_1(out)) # output: layer 1
        out = self.activ_2(self.linear_2(out)) # output: layer 2
        out = self.activ_3(self.linear_3(out)) # output: layer 3
        return out

model = MyDNN()
X_in = torch.randn(6000, 6, dtype=torch.float32)
with torch.no_grad():
    model_out = model(X_in)
print(f'model_out rank = {torch.linalg.matrix_rank(model_out)}')

model_out rank = 115. Apparently this is a WRONG output, there is no way that the output has so many linear dependent columns with all the inputs, weights and bias are randomly initialised!

This problem can be solved by changing the X_in dtype as well as the model dtype to float64 with the following code:

model_64 = MyDNN()
model_64.double()
X_in_64 = torch.randn(6000, 6, dtype=torch.float64)
with torch.no_grad():
    model_64_out = model_64(X_in_64)
print(f'model_64_out rank = {torch.linalg.matrix_rank(model_64_out)}')

model_64_out rank = 150

Here is my question:

  1. Why does this happen? Is this really a problem of data size? I mean float32 already has a good precision. Actually when I use my own training_data, even with mini_batch_size = 10 -> output.shape = (10, 150), my Rank(output) is less than 10.
  2. Although this problem can be solved by using double precision, this slows down the whole training process a lot (and with Mac M1 pro GPU, it only supports float32 type). Is there any other solution?
0

1 Answer 1

1

You have to realize that we are dealing with a numerical problem here: The rank of a matrix is a discrete value derived from a e.g. a singular value decomposition in the case of torch.matrix_rank. In this case we need to consider a threshold on the singular values: At what modulus tol do we consider a singular value as exactly zero?

Remember that we are dealing with floating point values where all operations always comes with truncation and rounding errors. In short there is no sense in trying to compute an exact rank.

So instead you might reconsider what kind of tolerance you use, you could e.g. use torch.linal.matrix_rank(..., tol=1e-6). The smaller the tolerance, the higher the expected rank.

But no matter what kind of floating point precision you use, I'd argue you will never be able to find meaningful "exact" number for the rank, it will always be a trade off! Therefore I'd reconsider whether you really need to compute the rank in the first place, or wether there is some other kind of criterion that is better suited for numerical considerations in the first place!

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, this precisely answered my question!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.