I am training a CNN using keras and tensorflow on my NVIDIA RTX 2000 Ada Generation Laptop GPU on Ubuntu 24.04 with recommended nvidia-driver-570.
When I train my network on the cpu only version of tensorflow, it takes a long time per epoch and when I switch to the gpu version, it takes the same amount of time. To confirm I am using the GPU, i have checked that tensorflow sees a gpu:
Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6067 MB memory: -> device: 0, name: NVIDIA RTX 2000 Ada Generation Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.9
and checked 'watch nvidia-smi':
+----------------------------------------------------------------------------------
| NVIDIA-SMI 570.133.07 Driver Version: 570.133.07 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+================
| 0 NVIDIA RTX 2000 Ada Gene... Off | 00000000:01:00.0 On | N/A |
| N/A 61C P3 17W / 60W | 6301MiB / 8188MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------
+-----------------------------------------------------------------------------------
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 4124 G /usr/lib/xorg/Xorg 95MiB |
| 0 N/A N/A 15531 C python 6180MiB |
+-----------------------------------------------------------------------------------
The output above shows that tensorflow allocated 6/8GB of VRAM, but GPU utilization is 0%. The CPU preprocessing of data input into my training model is not that intensive. If I add "debugging.set_log_device_placement(True)" to my code, I can see operations being assigned to the GPU. However, utilization is still at 0-5%.
When i run my network, I get these warnings which I don't think are the causing any issues, since they were also there last week when the GPU appeared to be working fine and training the same network was going faster:
external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered E0000 00:00:1747335420.459420 15531 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered E0000 00:00:1747335420.479948 15531 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered W0000 00:00:1747335420.623362 15531 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. W0000 00:00:1747335420.623382 15531 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. W0000 00:00:1747335420.623385 15531 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. W0000 00:00:1747335420.623387 15531 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. 2025-05-15 11:57:00.639265: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
How can I truly confirm whether or not training computations are being run on the GPU vs CPU?