Running out of GPU memory with PyTorch

Question

I am running my own custom deep belief network code using PyTorch and using the LBFGS optimizer. After optimization starts, my GPU starts to run out of memory, fully running out after a couple of batches, but I'm not sure why. Should I be purging memory after each batch is run through the optimizer? My code is as follows (with the portion of code that causes the problem marked):

def fine_tuning(self, data, labels, num_epochs=10, max_iter=3):
        '''
        Parameters
        ----------
        data : TYPE torch.Tensor
            N x D tensor with N = num samples, D = num dimensions
        labels : TYPE torch.Tensor
            N x 1 vector of labels for each sample
        num_epochs : TYPE, optional
            DESCRIPTION. The default is 10.
        max_iter : TYPE, optional
            DESCRIPTION. The default is 3.

        Returns
        -------
        None.

        '''
        N = data.shape[0]
        #need to unroll the weights into a typical autoencoder structure
        #encode - code - decode
        for ii in range(len(self.rbm_layers)-1, -1, -1):
            self.rbm_layers.append(self.rbm_layers[ii])
        
        L = len(self.rbm_layers)
        optimizer = torch.optim.LBFGS(params=list(itertools.chain(*[list(self.rbm_layers[ii].parameters()) 
                                                                    for ii in range(L)]
                                                                  )),
                                      max_iter=max_iter,
                                      line_search_fn='strong_wolfe') 
        
        dataset     = torch.utils.data.TensorDataset(data, labels)
        dataloader  = torch.utils.data.DataLoader(dataset, batch_size=self.batch_size*10, shuffle=True)
        #fine tune weights for num_epochs
        for epoch in range(1,num_epochs+1):
            with torch.no_grad():
                #get squared error before optimization
                v = self.pass_through_full(data)
                err = (1/N) * torch.sum(torch.pow(data-v.to("cpu"), 2))
            print("\nBefore epoch {}, train squared error: {:.4f}\n".format(epoch, err))
        
           #*******THIS IS THE PROBLEM SECTION*******#
            for ii,(batch,_) in tqdm(enumerate(dataloader), ascii=True, desc="DBN fine-tuning", file=sys.stdout):
                print("Fine-tuning epoch {}, batch {}".format(epoch, ii))
                with torch.no_grad():
                    batch = batch.view(len(batch) , self.rbm_layers[0].visible_units)
                    if self.use_gpu: #are we using a GPU?
                        batch = batch.to(self.device) #if so, send batch to GPU
                    B = batch.shape[0]
                    def closure():
                        optimizer.zero_grad()
                        output = self.pass_through_full(batch)
                        loss = nn.BCELoss(reduction='sum')(output, batch)/B
                        print("Batch {}, loss: {}\r".format(ii, loss))
                        loss.backward()
                        return loss
                    optimizer.step(closure)

The error I get is:

DBN fine-tuning: 0it [00:00, ?it/s]Fine-tuning epoch 1, batch 0     
Batch 0, loss: 4021.35400390625  
Batch 0, loss: 4017.994873046875  
DBN fine-tuning: 0it [00:00, ?it/s] 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>   
  File "/home/deep_autoencoder/deep_autoencoder.py", line 260, in fine_tuning  
    optimizer.step(closure)  
  File "/home/anaconda3/envs/torch_env/lib/python3.8/site-packages/torch/autograd
/grad_mode.py", line 15, in decorate_context 
    return func(*args, **kwargs)  
  File "/home/anaconda3/envs/torch_env/lib/python3.8/site-packages/torch/optim/lb
fgs.py", line 425, in step  
    loss, flat_grad, t, ls_func_evals = _strong_wolfe( 
  File "/home/anaconda3/envs/torch_env/lib/python3.8/site-packages/torch/optim/lb
fgs.py", line 96, in _strong_wolfe 
    g_prev = g_new.clone(memory_format=torch.contiguous_format) 
RuntimeError: CUDA out of memory. Tried to allocate 1.57 GiB (GPU 0; 24.00 GiB total capac
ity; 13.24 GiB already allocated; 1.41 GiB free; 20.07 GiB reserved in total by PyTorch)

This also racks up memory if I use CPU, so I'm not sure what the solution is here...

ihdv · Accepted Answer · 2020-11-15 03:17:45Z

2

The official document on LBFGS says:

This is a very memory intensive optimizer (it requires additional param_bytes * (history_size + 1) bytes). If it doesn’t fit in memory try reducing the history size, or use a different algorithm.

Since I see you didn't specify the history_size parameter in the initialization call of torch.optim.LBFGS, it should be 100 by default. Since you have used more than 10GB memory for the first two batches, I guess you need at least hundreds of GB of memory.

I'd suggest setting history_size to 1 to confirm that the problem is indeed caused by saving too much history. If it is, try solving it by reducing the history size or the parameter size.

answered Nov 15, 2020 at 3:17

ihdv

2,3373 gold badges20 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Running out of GPU memory with PyTorch

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related