0

I am training a CNN using keras and tensorflow on my NVIDIA RTX 2000 Ada Generation Laptop GPU on Ubuntu 24.04 with recommended nvidia-driver-570.

When I train my network on the cpu only version of tensorflow, it takes a long time per epoch and when I switch to the gpu version, it takes the same amount of time. To confirm I am using the GPU, i have checked that tensorflow sees a gpu:

Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6067 MB memory: -> device: 0, name: NVIDIA RTX 2000 Ada Generation Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.9

and checked 'watch nvidia-smi':

+----------------------------------------------------------------------------------
| NVIDIA-SMI 570.133.07             Driver Version: 570.133.07     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+================
|   0  NVIDIA RTX 2000 Ada Gene...    Off |   00000000:01:00.0  On |                  N/A |
| N/A   61C    P3             17W /   60W |    6301MiB /   8188MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------
                                                                                         
+-----------------------------------------------------------------------------------
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            4124      G   /usr/lib/xorg/Xorg                       95MiB |
|    0   N/A  N/A           15531      C   python                                 6180MiB |
+-----------------------------------------------------------------------------------

The output above shows that tensorflow allocated 6/8GB of VRAM, but GPU utilization is 0%. The CPU preprocessing of data input into my training model is not that intensive. If I add "debugging.set_log_device_placement(True)" to my code, I can see operations being assigned to the GPU. However, utilization is still at 0-5%.

When i run my network, I get these warnings which I don't think are the causing any issues, since they were also there last week when the GPU appeared to be working fine and training the same network was going faster:

external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered E0000 00:00:1747335420.459420 15531 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered E0000 00:00:1747335420.479948 15531 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered W0000 00:00:1747335420.623362 15531 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. W0000 00:00:1747335420.623382 15531 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. W0000 00:00:1747335420.623385 15531 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. W0000 00:00:1747335420.623387 15531 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. 2025-05-15 11:57:00.639265: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

How can I truly confirm whether or not training computations are being run on the GPU vs CPU?

2
  • The computations are running on the GPU, you are just getting an incomplete picture by using nvidia-smi, use something like nvtop to see GPU utilization over time as a plot. Also what exact model are you training? Might be too small to benefit from GPUs. Commented May 15 at 20:41
  • Ah okay, I do see that the GPU is very sporadically being used to train the network with nvtop. I guess I didn't realize how small the network I was training was. I will post a new question regarding keras/tensorflow crashing since that seems to be a separate issue. Both the cpu and gpu version of tensorflow tend to crash my PC somewhere in epoch 1-3. Commented May 21 at 19:38

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.