1

My problem is not related to the code, it is related to the "GPU memory" listed in the Windows task manager.

Briefly about the problem: I have an RTX4090 video card with 24GB of video memory. My code uses libtorch C++ v2.0.0 and is compiled in MSVC 2019 on Windows 10 x64 (CUDA 11.8). For 16 GB of RAM, my program worked fine. For some simple training example, it took about 5 seconds per epoch. At the same time, the amount of used GPU memory was about 6.3GB. Additionally, for 16GB of RAM, Windows Task Manager showed that the available "GPU memory" was 24 + 7.9 = 31.9GB, where 7.9GB is the "shared GPU memory" (looks like 50% of RAM). The same situation for 24 GB of RAM.

For 32GB of RAM, the total GPU memory is now 24 + 15.9 = 39.9GB, where 15.9GB is now the "shared GPU memory" (looks like 50% of RAM). And now when backward() method is executed, the used GPU memory increases from 4.4 GB to 37.4GB and after this the epochs take a very long time to calculate (instead of 5 secs it is about of 7 min).

The code I run is always the same! The only differences are the amount of some "virtual" video memory (a mixture of GPU memory and 50% of CPU memory, which is called "GPU memory" in Windows Task Manager, while physical video memory is called "dedicated GPU memory"). For my code, when this memory (39.9GB for 32GB of RAM -> 24+15.9=39.9GB) is more than 37.4GB, libtorch automatically allocates about 38GB in this "virtual" video memory to speed up the calculation of gradients (when method forward is called), but since these 37.4GB are not pure (physical) video memory, performance, on the contrary, drops. Moreover, when this "virtual" video memory is less than 37.4GB (for 16GB of RAM -> 24+7.9=32.9GB), then the torch automatically uses only about 6.3GB of physical video memory and the performance is relatively good (there is some increase in "GPU memory" only for the first epoch).

In these pictures, we can observe intensive GPU memory consumption when executing the 'backward' method for the same executable file, which varies depending on the amount of RAM:

16GB: windows task manager 16GB RAM

24GB: windows task manager 24GB RAM

32GB: windows task manager 32GB RAM

So my questions are: how to limit the available GPU memoryin a CUDA or torch? For example, is there a function in CUDA/libtorch that tells libtorch can only use 24GB of GPU memory. Or how to reduce (control) the "shared GPU memory" under Windows (I didn't find the appropriate options in my BIOS)?

2
  • When you train you should consume 3D computing resources. In your first screenshot you did allocate memory but you did barely consume compute resources. While in the last screenshot when you have a large memory consumption you consume significant GPU compute and memory resources. That looks strange. Commented Oct 12, 2023 at 19:13
  • I think that 3D is used intensively when copying occurs between the GPU and RAM, when 38GB of GPU memory is used. In training, CUDA is intensively used (unfortunately, I did not enable its display in the Task Manager). Commented Oct 13, 2023 at 11:11

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.