14

I have the following code which calculates a loss function:

class MSE_loss(nn.Module):
    """ 
    : metric: L1, L2 norms or cosine similarity
    : mode: training or evaluation mode
    """

    def __init__(self,metric, mode, weighted_sum = False):
        super(MSE_loss, self).__init__()
        self.metric = metric.lower()
        self.loss_function = nn.MSELoss()
        self.mode = mode.lower()
        self.weighted_sum = weighted_sum

    def forward(self, output1, output2, labels):
        self.labels = labels         
        self.linear = nn.Linear(output1.size()[0],1)

        if self.metric == 'cos':
            self.d= F.cosine_similarity(output1, output2)
        elif self.metric == 'l1':
            self.d = torch.abs(output1-output2)
        elif self.metric == 'l2':
            self.d = torch.sqrt((output1-output2)**2)

        def dimensional_reduction(forward):
            if self.weighted_sum:
                distance = self.linear(self.d)
            else:
                distance = torch.mean(self.d,1)
            return distance

        def estimate_loss(forward):
            distance = dimensional_reduction(self.d)
            pred = torch.exp(-distance)
            pred = torch.round(pred)
            loss = self.loss_function(pred, self.labels)
            return pred, loss

        pred, loss = estimate_loss(self.d)

        if self.mode == 'training':
            return loss
        else:
            return pred, loss

Given

criterion = MSE_loss('l1','training', weighted_sum = True)

I would like to get the distance after having gone through the self.linear neuron when implementing criterion. I am, however, prompted with the error 'Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_addmm' indicating that something is wrong. I have omitted the first part of the code, but I provide the whole error message, so that you can get an idea of what is going on.

RuntimeError                              Traceback (most recent call last)
<ipython-input-253-781ed4791260> in <module>()
      7 criterion = MSE_loss('l1','training', weighted_sum = True)
      8 
----> 9 train(test_net, train_loader, 10, batch_size, optimiser, clip, criterion)

<ipython-input-207-02fecbfe3b1c> in train(SNN, dataloader, epochs, batch_size, optimiser, clip, criterion)
     57 
     58             # calculate the loss and perform backprop
---> 59             loss = criterion(output1, output2, labels)
     60             a = [[n,p, p.grad] for n,p in SNN.named_parameters()]
     61 

~/.conda/envs/dalkeCourse/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    539             result = self._slow_forward(*input, **kwargs)
    540         else:
--> 541             result = self.forward(*input, **kwargs)
    542         for hook in self._forward_hooks.values():
    543             hook_result = hook(self, input, result)

<ipython-input-248-fb88b987ce71> in forward(self, output1, output2, labels)
     49             return pred, loss
     50 
---> 51         pred, loss = estimate_loss(self.d)
     52 
     53         if self.mode == 'training':

<ipython-input-248-fb88b987ce71> in estimate_loss(forward)
     43 
     44         def estimate_loss(forward):
---> 45             distance = dimensional_reduction(self.d)
     46             pred = torch.exp(-distance)
     47             pred = torch.round(pred)

<ipython-input-248-fb88b987ce71> in dimensional_reduction(forward)
     36             else:
     37                 if self.weighted_sum:
---> 38                     self.d = self.linear(self.d)
     39                 else:
     40                     self.d = torch.mean(self.d,1)

~/.conda/envs/dalkeCourse/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    539             result = self._slow_forward(*input, **kwargs)
    540         else:
--> 541             result = self.forward(*input, **kwargs)
    542         for hook in self._forward_hooks.values():
    543             hook_result = hook(self, input, result)

~/.conda/envs/dalkeCourse/lib/python3.6/site-packages/torch/nn/modules/linear.py in forward(self, input)
     85 
     86     def forward(self, input):
---> 87         return F.linear(input, self.weight, self.bias)
     88 
     89     def extra_repr(self):

~/.conda/envs/dalkeCourse/lib/python3.6/site-packages/torch/nn/functional.py in linear(input, weight, bias)
   1368     if input.dim() == 2 and bias is not None:
   1369         # fused op is marginally faster
-> 1370         ret = torch.addmm(bias, input, weight.t())
   1371     else:
   1372         output = input.matmul(weight.t())

RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_addmm

self.d is however a tensor, but this has already been passed into GPU, as shown below:

self.d =
tensor([[3.7307e-04, 8.4476e-04, 4.0426e-04,  ..., 4.2015e-04, 1.7830e-04,
         1.2833e-04],
        [3.9271e-04, 4.8325e-04, 9.5238e-04,  ..., 1.5126e-04, 1.3420e-04,
         3.9260e-04],
        [1.9278e-04, 2.6530e-04, 8.6903e-04,  ..., 1.6985e-05, 9.5103e-05,
         1.9610e-04],
        ...,
        [1.8257e-05, 3.1304e-04, 4.6398e-04,  ..., 2.7327e-04, 1.1909e-04,
         1.5069e-04],
        [1.7577e-04, 3.4820e-05, 9.4168e-04,  ..., 3.2848e-04, 2.2514e-04,
         5.4275e-05],
        [4.2916e-04, 1.6155e-04, 9.3186e-04,  ..., 1.0950e-04, 2.5083e-04,
         3.7374e-06]], device='cuda:0', grad_fn=<AbsBackward>)
0

4 Answers 4

18

In the forward of your MSE_loss, you define a linear layer that is probably still in the CPU (you didn't provide an MCVE, so I can only assume):

self.linear = nn.Linear(output1.size()[0], 1)

If you want to try and see if this is the problem, you can:

self.linear = nn.Linear(output1.size()[0], 1).cuda()

However, if self.d is in the CPU, then it would fail again. To solve this, you could move the linear to the same device of the self.d tensor by doing this:

def forward(self, output1, output2, labels):
    self.labels = labels         
    self.linear = nn.Linear(output1.size()[0], 1)

    if self.metric == 'cos':
        self.d = F.cosine_similarity(output1, output2)
    elif self.metric == 'l1':
        self.d = torch.abs(output1-output2)
    elif self.metric == 'l2':
        self.d = torch.sqrt((output1-output2)**2)

    # move self.linear to the correct device
    self.linear = self.linear.to(self.d.device)
Sign up to request clarification or add additional context in comments.

Comments

5

Just as a supplement or a general answer, each time you meet this cuda and cpu unmatched error, you should first check the following three things:

  1. Whether you put your model on cuda, in other words, whether you have the similar code as:
    model = nn.DataParallel(model, device_ids=None).cuda()
  2. Whether you put your input data on cuda, like input_data.cuda()
  3. Whether you put your tensor on cuda, something like:
    loss_sum = torch.tensor([losses.sum], dtype=torch.float32, device=device)

Emm, if you do the three checks, maybe you will solve your problem, good luck.

Comments

0

I also meet the same problem when build my model, and finally I find this is because I re-train the full-connected layer of my model, like this:

net.to(device)
pre_trained_model=model_path
missing_keys,unexpected_keys=net.load_state_dict(torch.load(pre_trained_model),strict=False)
net.fc=nn.Linear(inchannel,CLASSES)

Although the model was transport to cuda the re-defined fc is not, so the last line should be:

net.fc=nn.Linear(inchannel,CLASSES).to(device)

so check out whether this situation happen may help.

1 Comment

what is net.fc ? can you explain why net is to(device) but net.fc is not please ?
0

I have the same problem, and it turns out that I should use customized_block = nn.ModuleList([]) instead of customized_block = [] when defining the model.

Since modules in a normal list will not be recognized as nn.Module, it won't be put on GPU when calling model.cuda().

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.