Can it be used for different sizes e.g. (float16, int32)?
In case it needs more spelling out, here is the basic approach:
Order the types in descending order of alignment. This can be surprising with structs or vector types (e.g. float3 has the same alignment as float but float2 has a higher alignment). If unsure, pepper the code with static_assert(alignof(type1) >= alignof(type2)). It's free of runtime cost and prevents other developers from switching the order without thinking
Compute the size requirement without any extra precautions: sizeof(type1) * N + sizeof(type2) * M
Use pointer arithmetic to determine where one sub-array ends and the next starts
__global__ void kernel(
unsigned number_of_floats,
unsigned number_of_shorts,
unsigned number_of_bytes)
{
extern __shared__ float shared_floats[];
static_assert(alignof(*shared_floats) >= alignof(short));
short* shared_shorts = (short*) (shared_floats + number_of_floats);
static_assert(alignof(*shared_shorts) >= alignof(char));
char* shared_bytes = (char*) (shared_shorts + number_of_shorts);
}
void call_kernel(
unsigned number_of_floats,
unsigned number_of_shorts,
unsigned number_of_bytes,
dim3 blocks, dim3 threads)
{
size_t shared_size = sizeof(float) * number_of_floats
+ sizeof(short) * number_of_shorts
+ sizeof(char) * number_of_bytes;
kernel<<<blocks, threads, shared_size>>>(
number_of_floats, number_of_shorts, number_of_bytes);
}
Any more sophisticated solution like the one I outlined in CUDA : Shared memory alignement in documentation is only really necessary if you write generic code, e.g. a template, that may not know the correct order for alignment a-priori.
structmay cause additional bank conflicts, depending on your use case