Following this medium post, I understand how to save and load my model (or at least I think I do). They say the learning_rate is saved. However, looking at this person's code (it's a github repo with lots of people watching, forking, etc. so I'm assuming it shouldn't be filled with mistakes), the person writes:
def load_checkpoint(checkpoint_file, model, optimizer, lr):
print("=> Loading checkpoint")
checkpoint = torch.load(checkpoint_file, map_location=config.DEVICE)
model.load_state_dict(checkpoint["state_dict"])
optimizer.load_state_dict(checkpoint["optimizer"])
# If we don't do this then it will just have learning rate of old checkpoint
# and it will lead to many hours of debugging \:
for param_group in optimizer.param_groups:
param_group["lr"] = lr
Why doesn't optimizer.load_state_dict(checkpoint["optimizer"]) give the learning rate of old checkpoint. If so (I believe it does), why do they say it's a problem If we don't do this then it will just have learning rate of old checkpoint and it will lead to many hours of debugging.
There is no learning rate decay anyway in the code. So should it even matter?