I know how to create a global device function inside Host using np.array
or np.zeros
or np.empty(shape, dtype)
and then using cuda.to_device
to copy.
Also, one can declare shared array as cuda.shared.array(shape, dtype)
But how to create an array of constant size in the register of a particular thread inside gpu function.
I tried cuda.device_array
or np.array
but nothing worked.
I simply want to do this inside a thread -
x = array(CONSTANT, int32) # should make x for each thread
Numbapro supports
numba.cuda.local.array(shape, type)
for defining thread local arrays.As with CUDA C, whether than array is defined in local memory or register is a compiler decision based on usage patterns of the array. If the indexing pattern of the local array is statically defined and there is sufficient register space, the compiler will use registers to store the array. Otherwise it will be stored in local memory. See this question and answer pair for more information.