I have the following program
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
mod = SourceModule("""
#include <stdio.h>
__global__ void myfirst_kernel()
{
printf("I am in block no: %d thread no: %d \\n", blockIdx.x, threadIdx.x);
}
""")
function = mod.get_function("myfirst_kernel")
function(grid=(10,2),block=(1,1,1))
As you can see I am running 10 blocks and 2 threads per block. However the output is
python thread_execution.py
I am in block no: 1 thread no: 0
I am in block no: 7 thread no: 0
I am in block no: 1 thread no: 0
I am in block no: 7 thread no: 0
I am in block no: 3 thread no: 0
I am in block no: 0 thread no: 0
I am in block no: 3 thread no: 0
I am in block no: 6 thread no: 0
I am in block no: 9 thread no: 0
I am in block no: 0 thread no: 0
I am in block no: 9 thread no: 0
I am in block no: 6 thread no: 0
I am in block no: 5 thread no: 0
I am in block no: 2 thread no: 0
I am in block no: 5 thread no: 0
I am in block no: 8 thread no: 0
I am in block no: 4 thread no: 0
I am in block no: 2 thread no: 0
I am in block no: 8 thread no: 0
I am in block no: 4 thread no: 0
I was expecting threadIdx.x would give me 1 too. Why is always 0?
You are not running multiple threads per block. This:
launches a grid of 10 x 2 blocks, each of one thread each.
threadIdx.xwill be zero in each case, withblockIdx.xvarying between 0 and 9 (as shown in your output), andblockIdx.yvarying between 0 and 1 (not shown in your output but the reason there are two outputs per value ofblockIdx.x).