What is the difference of dynamic shared memory as kernel attribute and kernel argument in CUDA

58 views Asked by At

Wer are using dynamic shared memory in our CUDA kernels. We are setting the size of the shared memory for each kernel using the driver API cuFuncSetAttribute and CU_FUNC_ATTRIBUTE_MAX_DYNAMIC_SHARED_SIZE_BYTES.

The kernel is then launched using cuLaunchKernel where in the docs one of the parameter is unsigned int sharedMemBytes. This parameter is defined to set

Dynamic shared-memory size per thread block in bytes

This means I can set the dynamic memory size per kernel attribute and additionally I can set the shared memory size per kernel call.

Does this mean I can override the kernel attribute? Which one wins?

1

There are 1 answers

5
einpoklum On
  • kernel attribute -> maximum value
  • Launch configuration field -> actual value

Says so right in the name: MAX_DYNAMIC_SHARED_SIZE_BYTES vs sharedMemBytes. Note the MAX prefix :-)

Setting a different maximum value may effect the GPU's behavior when running the kernel, e.g. the allocation of regular L1 cache for use by the kernel (as in some/most NVIDIA GPU micro-architectures, shared memory is repurposed L1 cache, and their total amount is fixed but the proportions aren't; see also §16.6.4 of the CUDA C++ Programming Guide).

Now, it's true that passing a specific amount of shared memory could have implicitly done whatever setting maximum does; but - either that has somewhat of an overhead, or - it's just how NVIDIA has chosen to do things.