Minimizing a function using PyTorch Optimizer - Return values are all the same

Question

I'm trying to minimize a function in order to better understand the optimizer process. As an example I used the Eggholder-Function (https://www.sfu.ca/~ssurjano/egg.html) which is 2d. My goal is to get the values of my parameters (x and y) after every optimizer iteration so that i can visualize it afterwards.

Using Pytorch I wrote the following code:

def eggholder_function(x):
    return -(x[1] + 47) * torch.sin(torch.sqrt(torch.abs(x[1] + x[0]/2 + 47))) - x[0]*torch.sin(torch.sqrt(torch.abs(x[0]-(x[1]+47))))

def minimize(function, initial_parameters):
    list_params = []
    params = initial_parameters
    params.requires_grad_()
    optimizer = torch.optim.Adam([params], lr=0.1)

    for i in range(5):
        optimizer.zero_grad()
        loss = function(params)
        loss.backward()
        optimizer.step()
        list_params.append(params)
        


    return params, list_params

starting_point = torch.tensor([-30.,-10.])
minimized_params, list_of_params = minimize(eggholder_function, starting_point)

The output is as follows:

minimized_params: tensor([-29.4984, -10.5021], requires_grad=True)

and

list of params:


[tensor([-29.4984, -10.5021], requires_grad=True),
 tensor([-29.4984, -10.5021], requires_grad=True),
 tensor([-29.4984, -10.5021], requires_grad=True),
 tensor([-29.4984, -10.5021], requires_grad=True),
 tensor([-29.4984, -10.5021], requires_grad=True)]

While I understand that the minimized_params is infact the optimized minimum, why does list_of_params show the same values for every iteration?

Thank you and have a great day!

joe32140 · Accepted Answer · 2023-01-12 21:00:55Z

3

Because they all refer to the same object. You can check it by:

id(list_of_params[0]), id(list_of_params[1])

You can clone the params to avoid that:

import torch
def eggholder_function(x):
    return -(x[1] + 47) * torch.sin(torch.sqrt(torch.abs(x[1] + x[0]/2 + 47))) - x[0]*torch.sin(torch.sqrt(torch.abs(x[0]-(x[1]+47))))

def minimize(function, initial_parameters):
    list_params = []
    params = initial_parameters
    params.requires_grad_()
    optimizer = torch.optim.Adam([params], lr=0.1)

    for i in range(5):
        optimizer.zero_grad()
        loss = function(params)
        loss.backward()
        optimizer.step()
        list_params.append(params.detach().clone()) #here
        
    return params, list_params

starting_point = torch.tensor([-30.,-10.])
minimized_params, list_of_params = minimize(eggholder_function, starting_point)

#list_params
[tensor([-29.9000, -10.1000]),
 tensor([-29.7999, -10.2001]),
 tensor([-29.6996, -10.3005]),
 tensor([-29.5992, -10.4011]),
 tensor([-29.4984, -10.5021])]

answered Jan 12, 2023 at 21:00

joe32140

1,39211 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Caridorc Over a year ago

We made the same answer at the same time :) I just used deepcopy instead

joe32140 Over a year ago

Haha. Either works!

Caridorc · Accepted Answer · 2023-01-12 21:12:44Z

3

The tensor is changed while you optimize, as such of all its shallow copies in the list are changed, in order to fix your problem you should use deepcopy, as deepcopies do not change when the original is changed:

from copy import deepcopy

def eggholder_function(x):
    return -(x[1] + 47) * torch.sin(torch.sqrt(torch.abs(x[1] + x[0]/2 + 47))) - x[0]*torch.sin(torch.sqrt(torch.abs(x[0]-(x[1]+47))))

def minimize(function, initial_parameters):
    list_params = []
    params = initial_parameters
    params.requires_grad_()
    optimizer = torch.optim.Adam([params], lr=0.1)

    for i in range(5):
        optimizer.zero_grad()
        loss = function(params)
        loss.backward()
        optimizer.step()
        list_params.append(deepcopy(params))
        


    return params, list_params

starting_point = torch.tensor([-30.,-10.])
minimized_params, list_of_params = minimize(eggholder_function, starting_point)
print(minimized_params, list_of_params)

The outputs now are in fact different from one another and only the last one is equal to the final result:

(tensor([-29.9000, -10.1000], requires_grad=True),
[tensor([-29.9000, -10.1000]),
 tensor([-29.7999, -10.2001]),
 tensor([-29.6996, -10.3005]),
 tensor([-29.5992, -10.4011]),
 tensor([-29.4984, -10.5021])]

Many problems can derive from using shallow copies, when writing the first version of my code I use deepcopy as much as possible, then remove it when (if) space optimization is needed.

edited Jan 12, 2023 at 21:12

answered Jan 12, 2023 at 21:00

Caridorc

6,7513 gold badges36 silver badges47 bronze badges

4 Comments

Caridorc Over a year ago

@telias thanks to you for the interesting question, remember to also upvote the other answer that is also correct!

Caridorc Over a year ago

(upvoting means clicking the up-arrow, accepting means clicking the checkmark, only one answer can be accepted but all can be upvoted)

telias Over a year ago

Unfortunately upvoting is only available at 15+ rep, so i'm unable to do so :( sorry!

Caridorc Over a year ago

@telias No problem, you will soon gain reputation as you ask clear and interesting questions. Have a nice day :)

Collectives™ on Stack Overflow

Minimizing a function using PyTorch Optimizer - Return values are all the same

2 Answers 2

2 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related