Raw kernel with dynamically allocated shared memory

Question

Consider the following CUDA kernel that is used in Python via CuPy from the CuPy docs

add_kernel = cp.RawKernel(r'''
extern "C" __global__
void my_add(const float* x1, const float* x2, float* y) {
    
    extern __shared__ int sharedValues[];
    
    int tid = blockDim.x * blockIdx.x + threadIdx.x;
    y[tid] = x1[tid] + x2[tid];
}
''', 'my_add')
x1 = cp.arange(25, dtype=cp.float32).reshape(5, 5)
x2 = cp.arange(25, dtype=cp.float32).reshape(5, 5)

n = 1
shared_memory_size = n*64; #*sizeof(int)

# Output Results
y = cp.zeros((5, 5), dtype=cp.float32)

add_kernel((512,), (1024,), (x1, x2, y))  # grid, block and arguments

I would like to assign the shared memory in a (cupy raw) CUDA kernel but I don't know how to give the parameter shared_memory_size to the add_kernel.

Example taken from Using Shared Memory in CUDA C/C++ (Nvidia blog post)

add_kernel<<<512,1024,n*sizeof(int)>>>(x1,x2,y);

When I try to call it with an additional parameter, I get an error

add_kernel((512,), (1024,), (shared_memory_size,), (x1, x2, y))  # grid, block and arguments

When I try to set the attribute

assign_importance_into_dense_array_kernel_int32.shared_size_bytes = shared_memory_size;

I get the error

AttributeError: attribute 'shared_size_bytes' of 'cupy._core.raw.RawKernel' objects is not writable

There is also the parameter add_kernel.max_dynamic_shared_size_bytes but does that change the dynamic size?

dynamic shared memory to be allocated per block is specified in the call itself. That other thing (attribute) that you are playing with is not how you set dynamic shared memory to be actually allocated at the call point. It is something else. Right now your call has that (last) argument defaulted to zero. Explicitly set it to something else. When you try to call it "with an additional parameter" you are putting that parameter in the wrong place. — Robert Crovella
– Robert Crovella, Commented Oct 3, 2024 at 18:40

Johan · Accepted Answer · 2024-10-04 15:06:21Z

4

As @Robert said, you specify the dynamic shared memory in the cp.RawKernel.__call__ method. As per the documentation

https://docs.cupy.dev/en/stable/reference/generated/cupy.RawKernel.html

__call__(self, grid, block, args, *, shared_mem=0)

The last named parameter can be set to n*sizeof(int), and you'll be good to go.

...
shared_memory_size = n * sizeof(int)
add_kernel((512,), (1024,), (x1, x2, y), shared_mem = shared_memory_size)

edited Oct 4, 2024 at 15:06

answered Oct 4, 2024 at 12:26

Johan

77.4k28 gold badges204 silver badges346 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Raw kernel with dynamically allocated shared memory

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related