0

After converting module A to CPU, the origin parameter tensor still stays on the GPU? When it is released? Is it wrong if I reuse the parameter?

My code:

import torch.nn as nn

class A(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(10, 5)
        self.relu = nn.ReLU()

    def forward(self, x):
        return self.relu(self.fc(x))

a = A().to('cuda')

weight = {}
for key, value in a.state_dict().items():
    weight[key] = value

a.to('cpu')
print("a.state_dict() device:", [t.device for t in a.state_dict().values()])  # in CPU
print("weight device:", [t.device for t in weight.values()])  # still in GPU

Result:

a.state_dict() device: [device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu')]

weight device: [device(type='cuda', index=0), device(type='cuda', index=0), device(type='cuda', index=0), device(type='cuda', index=0)]

Why are the tensors in weight still on the GPU?

New contributor
jiwei zhang is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.

2 Answers 2

0

Python for loop creates copies of values, it does not reference original values in a.

Here is a small reproducible without tensors.

original_dict = {'a': 1, 'b': 2, 'c': 3}

new_dict = {}
for key, value in original_dict.items():
    new_dict[key] = value

print("Original dict:", original_dict) # {'a': 1, 'b': 2, 'c': 3}
print("New dict:", new_dict) # {'a': 1, 'b': 2, 'c': 3}

# Changing original dict values to show they are independent
original_dict['a'] = 10
original_dict['b'] = 20
original_dict['c'] = 30

print("After modifying original dict:") 
print("Original dict:", original_dict) # {'a': 10, 'b': 20, 'c': 30}
print("New dict:", new_dict) # {'a': 1, 'b': 2, 'c': 3}

So, when you move a back to CPU, it does not affect residence of weight, because weight contains only copied values of a that were on GPU at the time of copy.

Sign up to request clarification or add additional context in comments.

2 Comments

I would be careful with statements like this. Most objects in Python are actually passed via reference, i.e. the same object. You cannot generalize to complex cases like Pytorch from the integer case. In OP's case, the state_dict() function indeed creates copies, however.
I think there’s a small clarification here. The loop isn’t what creates copies — state_dict() itself returns new tensors (a snapshot). Those tensors won’t move when the module moves devices, so the ones stored in weight stay on CUDA. Just adding this for completeness.
0

In your code, I altered the printouts a bit, to visualize a bit better (at least in my opinion) what's going on:

import torch.nn as nn

class A(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(10, 5)
        self.relu = nn.ReLU()

    def forward(self, x):
        return self.relu(self.fc(x))
    
a = A().to('cuda')
print("\nfrom `A().to('cuda')`")
print(f"{id(a.fc.weight)=}, {a.fc.weight.data_ptr()=}")
print(f"{id(a.fc.bias)=}, {a.fc.bias.data_ptr()=}")
# from `A().to('cuda')`
# id(a.fc.weight)=138293720716624, a.fc.weight.data_ptr()=138293368850432
# id(a.fc.bias)=138293720716720, a.fc.bias.data_ptr()=138293368850944

weight = {}
for key, value in a.state_dict().items():
    weight[key] = value
print("\nfrom `weight`")
for key, value in weight.items():
    print(f"{key}, {id(value)=}, {value.data_ptr()=}")
# from `weight`
# fc.weight, id(value)=138293720716816, value.data_ptr()=138293368850432
# fc.bias, id(value)=138293720716528, value.data_ptr()=138293368850944

a.to('cpu')
print("\nfrom `a.to('cpu')`")
print(f"{id(a.fc.weight)=}, {a.fc.weight.data_ptr()=}")
print(f"{id(a.fc.bias)=}, {a.fc.bias.data_ptr()=}")
# from `a.to('cpu')`
# id(a.fc.weight)=138293720716624, a.fc.weight.data_ptr()=101884008832832
# id(a.fc.bias)=138293720716720, a.fc.bias.data_ptr()=101884008983616

If you compare the first two blocks of printouts, (the one from A().to('cuda') and the one from the weight dict), you get:

# from `A().to('cuda')`
# id(a.fc.weight)=138293720716624, a.fc.weight.data_ptr()=138293368850432
# id(a.fc.bias)=138293720716720, a.fc.bias.data_ptr()=138293368850944

# from `weight`
# fc.weight, id(value)=138293720716816, value.data_ptr()=138293368850432
# fc.bias, id(value)=138293720716528, value.data_ptr()=138293368850944

The IDs are different, but the data pointers are the same. What this means: the weight dict contains shallow copies of the tensors in a ("copies" because they have a new ID, "shallow" because they point to the same memory; namely the one on the GPU). This is in line with state_dict(), which you are using to produce the weight dict, and which is documented to produce shallow copies.

If you compare the first and last block of printouts, (the one from A().to('cuda') and the one from a.to('cpu'), you have the opposite situation:

# from `A().to('cuda')`
# id(a.fc.weight)=138293720716624, a.fc.weight.data_ptr()=138293368850432
# id(a.fc.bias)=138293720716720, a.fc.bias.data_ptr()=138293368850944

# from `a.to('cpu')`
# id(a.fc.weight)=138293720716624, a.fc.weight.data_ptr()=101884008832832
# id(a.fc.bias)=138293720716720, a.fc.bias.data_ptr()=101884008983616

The IDs are the same, but the data pointers are different. What this means: the parameters of your model a (a.fc.weight and a.fc.bias) still refer to the same tensor objects (same IDs), but in the meantime, the tensor's underlying memory has been replaced (different memory pointers; namely, now pointing to the CPU memory).

Your code ends with

print("a.state_dict() device:", [t.device for t in a.state_dict().values()])  # in CPU
print("weight device:", [t.device for t in weight.values()])  # still in GPU

So you are comparing

  • the tensors of your model a's parameters (or rather, a new shallow copy of them, since you call a.state_dict() once more), the memory of which, by now, has been moved to the CPU, with
  • the shallow copies from earlier on (items in weight dict, which result from your first call of a.state_dict()), whose memory still resides on the GPU.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.