Newest 'nvidia' Questions

1 vote

1 answer

131 views

How to correctly install JAX with CUDA on Linux when `jax[cuda12_pip]` consistently falls back to the CPU version?

I am trying to install JAX with GPU support on a powerful, dedicated Linux server, but I am stuck in what feels like a Catch-22 where every official installation method fails in a different way, ...

PowerPoint Trenton

115

asked Nov 12 at 9:36

1 vote

1 answer

57 views

How to debug cuda in Visual Studio with "step over"

I installed NVIDIA Nsight Visual Studio Edition 2025.01 in Visual Studio 2022. I want to debug code, but I can't debug with step over(F10), The debugger always stops at a location without a breakpoint....

Imagination Youth

11

asked Oct 31 at 2:36

1 vote

0 answers

132 views

Why does “Command Buffer Full” appear in PyTorch CUDA kernel launches?

I’m using the PyTorch profiler to analyze sglang, and I noticed that in the CUDA timeline, some kernels show “Command Buffer Full”. This causes the cudaLaunchKernel time to become very long, as shown ...

plznobug

143

asked Oct 23 at 12:36

2 votes

0 answers

215 views

jax plugin configuration error: Exception when calling jax_plugins.xla_cuda12.initialize()

I am using WSL2 on windows 10. I have NVIDIA graphics card. I recently installed GPU jax using the command pip install -U "jax[cuda12]". This completed successfully, but when I run any jax ...

DrMittal

51

asked Oct 14 at 14:14

0 votes

1 answer

99 views

CPU-GPU producer-consumer pattern using unified memory but GPU is in spin loop

I am trying to implement producer consumer problem in GPU-CPU. Required for some other project. GPU requests some data via Unified memory to CPU. CPU copies that data to a specific location in global ...

Chinmaya Bhat K K

1

asked Sep 30 at 18:38

0 votes

0 answers

150 views

TensorRT DLA Engine Build Fails for PWC-Net on Jetson NX - Missing Layer Support?

I'm converting a PWC-Net optical flow model to run on Jetson NX DLA using the iSLAM framework, but the TensorRT engine build fails during DLA optimization. Environment Hardware: NVIDIA Jetson NX ...

Unknown

705

asked Sep 15 at 7:33

0 votes

1 answer

134 views

How to correctly monitor a program’s GPU memory bandwidth utilization and SM utilization? (DCGM DRAM_ACTIVE vs in-program bandwidth differs a lot)

I want to quantitatively measure the memory bandwidth utilization and SM utilization of a CUDA program for performance analysis and regression testing. My approach so far: Compute the theoretical ...

plznobug

143

asked Sep 5 at 10:48

1 vote

1 answer

283 views

How are fp6 and fp4 supported on NVIDIA Tensor Core on Blackwell?

I am writing PTX assembly code on CUDA C++ for research. This is my setup: I have just downloaded the latest CUDA C++ toolkit (13.0) yesterday on WSL linux. The local compilation environment does not ...

Junhao Liu

11

asked Aug 14 at 10:03

2 votes

1 answer

67 views

How to correctly pass float4 vector to kernel using PyCUDA?

I am trying to pass a float4 as argument to my cuda kernel (by value) using PyCUDA’s make_float4(). But there seems to be some misalignment when the data is transferred to the kernel. If I read the ...

Dodilei

308

asked Aug 7 at 19:49

2 votes

0 answers

40 views

What do shuffle instructions do on the hardware? [duplicate]

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-shfl-...

Tom Huntington

3,705

asked Jul 23 at 5:49

1 vote

1 answer

190 views

(NVIDIA/nv-embed-v2) ImportError: cannot import name 'MISTRAL_INPUTS_DOCSTRING' from 'transformers.models.mistral.modeling_mistral'

My code: from transformers import AutoTokenizer, AutoModel model_name = "NVIDIA/nv-embed-v2" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModel.from_pretrained(...

6zL

21

asked Jul 5 at 13:27

0 votes

0 answers

50 views

Problem about copying data from one gpu to another using torch

Recently，I upgraded my Ubuntu from 22.04 to 24.04 and found the performence of my trained deep network written by torch degrade. After debug, I found the problem is copying data from one gpu to ...

wangwei

49

asked Jun 30 at 16:14

-4 votes

1 answer

42 views

For a large project, how should I search for a specific kernel in nsys?

When I was analyzing a large project, there were many kernel files. I wanted to find a specific kernel in the file obtained from nsys analysis. How should I operate

rongtao zhou

1

asked Jun 23 at 8:44

2 votes

0 answers

81 views

Reusing shared data between global functions

Is there an officially sanctioned way to reuse shared data between global functions? Consider the following code https://cuda.godbolt.org/z/KMj9EKKbf: #include <cuda.h> #include <stdio.h> ...

Johan

77.4k

asked Jun 3 at 9:42

-4 votes

1 answer

275 views

Is Bryce Lelbach's claim regarding progress guarantees on non-NVIDIA GPUs true?

In a talk on The C++ Execution Model, from the cppunderthesea 2024 conference, at around 44:50, NVIDIA's Bryce Adelstein Lelbach claims, that non-NVIDIA GPUs give no guarantee of threads progressing (&...

einpoklum

137k

asked May 30 at 11:36

0 votes

0 answers

260 views

docker-compose.yaml: services.OD Additional property device_requests is not allowed

I am trying to run a docker compose and failing. I have here a minimal reproducible example. First with this docker-compose.yml services: hello-app: image: python:3.10-slim command: python ...

KansaiRobot

10.6k

asked May 20 at 2:58

0 votes

0 answers

87 views

Trouble Detecting NIC Ports in DPDK Program with ConnectX-6

I'm encountering an issue while developing a DPDK-based program using a dual-port ConnectX-6 NIC on Ubuntu 24.04. Despite following the setup instructions, my program fails to detect the NIC ports. ...

Mohammad P

1

asked Apr 27 at 9:49

0 votes

0 answers

58 views

How to deal with a process holding nvidia GPU memory after termination？ [duplicate]

I am facing an issue with a process that holds GPU memory even after I have terminated it. Here's a detailed breakdown of the situation: The process (a CUDA application) is running and occupies GPU ...

Elspeth Gilbert

1

asked Apr 2 at 13:16

2 votes

2 answers

97 views

Standard way of calling math functions in C when using OpenMP & its offloading feature(s)?

I am writing some code in C in which I want to add the optional ability to have certain sections of the code accelerated using OpenMP, and with an additional optional ability to have them accelerated ...

Matthew G.

124

asked Mar 30 at 19:01

0 votes

1 answer

74 views

Why is tensorflow not recognizing my gpu after installing it with anaconda

So im trying to use tensorflow with my yolov8 project but for some reason it is not recognizing my gpu. I had originally installed it using pip but i was told i should use conda instead, so i switched ...

James Pelham-Burn

1

asked Mar 27 at 2:17

0 votes

0 answers

120 views

TensorRT Access Violation Error (0xC0000005) at nvinfer_10.dll - How to Resolve?

Environment: OS: Windows Operating System TensorRT Version: TensorRT-10.3.0.26 NVIDIA CUDA Version: 12.6 cuDNN Version: 9.8 GPU: RTX 3050ti laptop GPU Issue Description: I am encountering an "...

B.Uluer

11

asked Mar 21 at 13:31

1 vote

0 answers

50 views

DuplicateOutput Fails with NVIDIA set as Preferred graphic Adapter on Dual Graphics System

I have a laptop with an integrated Intel graphics card and an NVIDIA T1000 graphics card. I set the NVIDIA card as the preferred graphic processor in the Managed 3D in NVIDIA Control Panel. However, ...

Martin121233

21

asked Mar 13 at 22:56

2 votes

0 answers

252 views

Not able to access GPU within the docker container

I am using Ubuntu 22.04. I have nvidia-570 driver installed along with cuda 12.4 on my host machine. However, I am not able to access gpu in my container. This is my docker-compose-file version: '3.8' ...

prarthana sigedar

21

asked Mar 5 at 7:59

1 vote

0 answers

134 views

Error response from daemon: could not select device driver "nvidia" with capabilities: [[utility compute]]

I'm trying to build a docker for realtime-whiper. The build process finishes successflly but at the end it gives this error: Error response from daemon: could not select device driver "nvidia&...

Ali Zekai Deveci

11

asked Mar 4 at 14:25

0 votes

0 answers

152 views

IsaacGym visualization seg faulting; Vulkaninfo giving BadMatch error with X_CreateWindow failed request

I am trying to visualize something on Isaac Gym, but when I run gym.draw_viewer(viewer, sim, True) I get a segmentation fault. I think it has something to do with Vulkan because when I run vulkaninfo ...

Kai McClennen

3

asked Feb 23 at 20:03

Collectives™ on Stack Overflow

How to correctly install JAX with CUDA on Linux when `jax[cuda12_pip]` consistently falls back to the CPU version?

How to debug cuda in Visual Studio with "step over"

Why does “Command Buffer Full” appear in PyTorch CUDA kernel launches?

jax plugin configuration error: Exception when calling jax_plugins.xla_cuda12.initialize()

CPU-GPU producer-consumer pattern using unified memory but GPU is in spin loop

TensorRT DLA Engine Build Fails for PWC-Net on Jetson NX - Missing Layer Support?

How to correctly monitor a program’s GPU memory bandwidth utilization and SM utilization? (DCGM DRAM_ACTIVE vs in-program bandwidth differs a lot)

How are fp6 and fp4 supported on NVIDIA Tensor Core on Blackwell?

How to correctly pass float4 vector to kernel using PyCUDA?

What do shuffle instructions do on the hardware? [duplicate]

(NVIDIA/nv-embed-v2) ImportError: cannot import name 'MISTRAL_INPUTS_DOCSTRING' from 'transformers.models.mistral.modeling_mistral'

Problem about copying data from one gpu to another using torch

For a large project, how should I search for a specific kernel in nsys?

Reusing shared data between global functions

Is Bryce Lelbach's claim regarding progress guarantees on non-NVIDIA GPUs true?

docker-compose.yaml: services.OD Additional property device_requests is not allowed

Trouble Detecting NIC Ports in DPDK Program with ConnectX-6

How to deal with a process holding nvidia GPU memory after termination？ [duplicate]

Standard way of calling math functions in C when using OpenMP & its offloading feature(s)?

Why is tensorflow not recognizing my gpu after installing it with anaconda

TensorRT Access Violation Error (0xC0000005) at nvinfer_10.dll - How to Resolve?

DuplicateOutput Fails with NVIDIA set as Preferred graphic Adapter on Dual Graphics System

Not able to access GPU within the docker container

Error response from daemon: could not select device driver "nvidia" with capabilities: [[utility compute]]

IsaacGym visualization seg faulting; Vulkaninfo giving BadMatch error with X_CreateWindow failed request

Hot Network Questions