GPU memory leakage when creating objects from sentence-transformers

Question

Description

I am creating a function in R that embeds sentences using the sentence_transformers library from Python.

For some unknown reason, creating the object multiple times under the same variable name ends up in insufficient memory space to allocate the transformer. To reproduce:

sentence_transformers <- reticulate::import("sentence_transformers")
for (i in 1:10) {
  print(i)
  bert_encoder <- sentence_transformers$SentenceTransformer("bert-large-nli-stsb-mean-tokens")
}

However, doing the same operation directly on Python does not produce an error

from sentence_transformers import SentenceTransformer
for i in range(10):
    print(i)
    bert_encoder = SentenceTransformer("bert-large-nli-stsb-mean-tokens")
}

This happens with any model that is allocated in GPU. On my NVIDIA GTX 1060 it reaches the 4th cycle, but on smaller GPUs it crashes earlier. One temporal solution is to create the model outside only once, and then pass the model as a parameter to the function as many times as wanted, but I would rather avoid that because it adds an extra step and in any case calling multiple models might just make it crash as well.

Expected behaviour

The for loop finishes without an error

Observed behaviour

Error in py_call_impl(callable, dots$args, dots$keywords) : RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 2.95 GiB already allocated; 16.11 MiB free; 238.68 MiB cached)

Unsuccesful attemps at solving it

The solutions proposed here
Using numba as suggested here
Declaring the variable explicitely on Python via reticulate::py_run_string() and then doing del bert_encoder and calling the garbage collector

Details

Windows 10 Home

Python 3.7.4

R 4.0.1

Reticulate 1.16

Torch 1.3.1

Tensorflow 2.2.0

Transformers 2.11.0

sentence_transformers 0.2.6

hi. I had the same issue with tf$... syntax, for some reason GPU memory was not released after loops. What worked for me, i m not sure why, was to reimplement whole algorithm with keras functions, instead of using tf$add I have used layer_add and then memory were released as it should. — JacobJacox
– JacobJacox, Commented Jul 16, 2020 at 9:33
@JacobJacox could you please elaborate on how I would do what you suggested? I am not implementing a network from scratch, but rather loading a pretrained model, so I can not see how you suggestion might suit me — Luisda
– Luisda, Commented Jul 16, 2020 at 9:37
I know it is not pleasant but I did not find other alternative for this memory loop problem. If you look how custom model are implemented, you could do it probably in half a day, and if you folllow their source code from Github, you can take their weights to speed up training. If everything else fails, this is almost certain it will work. — JacobJacox
– JacobJacox, Commented Jul 16, 2020 at 9:43
@JacobJacox Have a look at my proposed solution. It solved the issue for me. — Luisda
– Luisda, Commented Aug 20, 2020 at 10:01
Hi, i see, unfortunately i am not familiar with pytorch, only tensorflow, but kudos if it works :). — JacobJacox
– JacobJacox, Commented Aug 20, 2020 at 13:18

Luisda · Accepted Answer · 2020-08-20 10:00:42Z

Ok so I am posting my solution for anyone else having this issue.

After each call to the model as

sentence_transformers <- import("sentence_transformers")
encoder <- sentence_transformers$SentenceTransformer("bert-large-nli-stsb-mean-tokens")

I release GPU memory using

  # Has this been done on a GPU?
  py <- reticulate::py_run_string("import torch
is_cuda_available = torch.cuda.is_available()")

  # Release GPU
  if (isTRUE(reticulate::py$is_cuda_available)) {

    tryCatch(reticulate::py_run_string("del encoder"),
             warning = function(e) {},
             error = function(e) {})

    tryCatch(rm(encoder),
             warning = function(e) {},
             error = function(e) {})

    gc(full = TRUE, verbose = FALSE)

    py <- reticulate::py_run_string("import torch
torch.cuda.empty_cache()")

  }

and it works perfectly.

Collectives™ on Stack Overflow

GPU memory leakage when creating objects from sentence-transformers

Description

Expected behaviour

Observed behaviour

Unsuccesful attemps at solving it

Details

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Description

Expected behaviour

Observed behaviour

Unsuccesful attemps at solving it

Details

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related