0

Consider the following CUDA kernel that is used in Python via CuPy from the CuPy docs

add_kernel = cp.RawKernel(r'''
extern "C" __global__
void my_add(const float* x1, const float* x2, float* y) {
    
    extern __shared__ int sharedValues[];
    
    int tid = blockDim.x * blockIdx.x + threadIdx.x;
    y[tid] = x1[tid] + x2[tid];
}
''', 'my_add')
x1 = cp.arange(25, dtype=cp.float32).reshape(5, 5)
x2 = cp.arange(25, dtype=cp.float32).reshape(5, 5)

n = 1
shared_memory_size = n*64; #*sizeof(int)

# Output Results
y = cp.zeros((5, 5), dtype=cp.float32)

add_kernel((512,), (1024,), (x1, x2, y))  # grid, block and arguments

I would like to assign the shared memory in a (cupy raw) CUDA kernel but I don't know how to give the parameter shared_memory_size to the add_kernel.

Example taken from Using Shared Memory in CUDA C/C++ (Nvidia blog post)

add_kernel<<<512,1024,n*sizeof(int)>>>(x1,x2,y);

When I try to call it with an additional parameter, I get an error

add_kernel((512,), (1024,), (shared_memory_size,), (x1, x2, y))  # grid, block and arguments

When I try to set the attribute

assign_importance_into_dense_array_kernel_int32.shared_size_bytes = shared_memory_size;

I get the error

AttributeError: attribute 'shared_size_bytes' of 'cupy._core.raw.RawKernel' objects is not writable

There is also the parameter add_kernel.max_dynamic_shared_size_bytes but does that change the dynamic size?

1
  • 2
    dynamic shared memory to be allocated per block is specified in the call itself. That other thing (attribute) that you are playing with is not how you set dynamic shared memory to be actually allocated at the call point. It is something else. Right now your call has that (last) argument defaulted to zero. Explicitly set it to something else. When you try to call it "with an additional parameter" you are putting that parameter in the wrong place. Commented Oct 3, 2024 at 18:40

1 Answer 1

4

As @Robert said, you specify the dynamic shared memory in the cp.RawKernel.__call__ method. As per the documentation

https://docs.cupy.dev/en/stable/reference/generated/cupy.RawKernel.html

__call__(self, grid, block, args, *, shared_mem=0)

The last named parameter can be set to n*sizeof(int), and you'll be good to go.

...
shared_memory_size = n * sizeof(int)
add_kernel((512,), (1024,), (x1, x2, y), shared_mem = shared_memory_size)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.