I have a 3D grid of 3D blocks, and within each block I need to compute sequentially on the "z" layers of the block. In other words, I want to execute first all (x,y,0) threads, then all (x,y,1), etc. I need to execute my threads layer by layer (counting layers by axis z). I know about the function __syncthreads(), but I don't know how to syncronize threads with it the way I want.
UPD:
__global__ void Kernel(//some params)
{
//some code
__syncthreads();
}
It syncronizes all the threads in the block. But I need to execute all the threads where z = 0, then all the threads where z=1, etc.
You can use a simple loop, and specify the threads you want to do the work in each iteration. Something like:
In each iteration, threads with a specific z-index execute the instructions, while the others are idle; at the end of each iteration all threads synchronize.