3

I'm trying to minimize a function in order to better understand the optimizer process. As an example I used the Eggholder-Function (https://www.sfu.ca/~ssurjano/egg.html) which is 2d. My goal is to get the values of my parameters (x and y) after every optimizer iteration so that i can visualize it afterwards.

Using Pytorch I wrote the following code:

def eggholder_function(x):
    return -(x[1] + 47) * torch.sin(torch.sqrt(torch.abs(x[1] + x[0]/2 + 47))) - x[0]*torch.sin(torch.sqrt(torch.abs(x[0]-(x[1]+47))))

def minimize(function, initial_parameters):
    list_params = []
    params = initial_parameters
    params.requires_grad_()
    optimizer = torch.optim.Adam([params], lr=0.1)

    for i in range(5):
        optimizer.zero_grad()
        loss = function(params)
        loss.backward()
        optimizer.step()
        list_params.append(params)
        


    return params, list_params

starting_point = torch.tensor([-30.,-10.])
minimized_params, list_of_params = minimize(eggholder_function, starting_point)

The output is as follows:

minimized_params: tensor([-29.4984, -10.5021], requires_grad=True)

and

list of params:


[tensor([-29.4984, -10.5021], requires_grad=True),
 tensor([-29.4984, -10.5021], requires_grad=True),
 tensor([-29.4984, -10.5021], requires_grad=True),
 tensor([-29.4984, -10.5021], requires_grad=True),
 tensor([-29.4984, -10.5021], requires_grad=True)]

While I understand that the minimized_params is infact the optimized minimum, why does list_of_params show the same values for every iteration?

Thank you and have a great day!

2 Answers 2

3

Because they all refer to the same object. You can check it by:

id(list_of_params[0]), id(list_of_params[1])

You can clone the params to avoid that:

import torch
def eggholder_function(x):
    return -(x[1] + 47) * torch.sin(torch.sqrt(torch.abs(x[1] + x[0]/2 + 47))) - x[0]*torch.sin(torch.sqrt(torch.abs(x[0]-(x[1]+47))))

def minimize(function, initial_parameters):
    list_params = []
    params = initial_parameters
    params.requires_grad_()
    optimizer = torch.optim.Adam([params], lr=0.1)

    for i in range(5):
        optimizer.zero_grad()
        loss = function(params)
        loss.backward()
        optimizer.step()
        list_params.append(params.detach().clone()) #here
        
    return params, list_params

starting_point = torch.tensor([-30.,-10.])
minimized_params, list_of_params = minimize(eggholder_function, starting_point)
#list_params
[tensor([-29.9000, -10.1000]),
 tensor([-29.7999, -10.2001]),
 tensor([-29.6996, -10.3005]),
 tensor([-29.5992, -10.4011]),
 tensor([-29.4984, -10.5021])]
Sign up to request clarification or add additional context in comments.

2 Comments

We made the same answer at the same time :) I just used deepcopy instead
Haha. Either works!
3

The tensor is changed while you optimize, as such of all its shallow copies in the list are changed, in order to fix your problem you should use deepcopy, as deepcopies do not change when the original is changed:

from copy import deepcopy

def eggholder_function(x):
    return -(x[1] + 47) * torch.sin(torch.sqrt(torch.abs(x[1] + x[0]/2 + 47))) - x[0]*torch.sin(torch.sqrt(torch.abs(x[0]-(x[1]+47))))

def minimize(function, initial_parameters):
    list_params = []
    params = initial_parameters
    params.requires_grad_()
    optimizer = torch.optim.Adam([params], lr=0.1)

    for i in range(5):
        optimizer.zero_grad()
        loss = function(params)
        loss.backward()
        optimizer.step()
        list_params.append(deepcopy(params))
        


    return params, list_params

starting_point = torch.tensor([-30.,-10.])
minimized_params, list_of_params = minimize(eggholder_function, starting_point)
print(minimized_params, list_of_params)

The outputs now are in fact different from one another and only the last one is equal to the final result:

(tensor([-29.9000, -10.1000], requires_grad=True),
[tensor([-29.9000, -10.1000]),
 tensor([-29.7999, -10.2001]),
 tensor([-29.6996, -10.3005]),
 tensor([-29.5992, -10.4011]),
 tensor([-29.4984, -10.5021])]

Many problems can derive from using shallow copies, when writing the first version of my code I use deepcopy as much as possible, then remove it when (if) space optimization is needed.

4 Comments

@telias thanks to you for the interesting question, remember to also upvote the other answer that is also correct!
(upvoting means clicking the up-arrow, accepting means clicking the checkmark, only one answer can be accepted but all can be upvoted)
Unfortunately upvoting is only available at 15+ rep, so i'm unable to do so :( sorry!
@telias No problem, you will soon gain reputation as you ask clear and interesting questions. Have a nice day :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.