Threads syncronization in CUDA

144 views Asked by At

I have a 3D grid of 3D blocks, and within each block I need to compute sequentially on the "z" layers of the block. In other words, I want to execute first all (x,y,0) threads, then all (x,y,1), etc. I need to execute my threads layer by layer (counting layers by axis z). I know about the function __syncthreads(), but I don't know how to syncronize threads with it the way I want.

UPD:

 __global__ void Kernel(//some params)
 {


      //some code 
      __syncthreads();
 }

It syncronizes all the threads in the block. But I need to execute all the threads where z = 0, then all the threads where z=1, etc.

2

There are 2 answers

5
Noel On BEST ANSWER

You can use a simple loop, and specify the threads you want to do the work in each iteration. Something like:

for (int z = 0; z < zmax; z++) {
    if (threadIdx.z == z) { 
        //do whatever with x and y
    }
    __syncthreads();
}

In each iteration, threads with a specific z-index execute the instructions, while the others are idle; at the end of each iteration all threads synchronize.

5
Avi Ginsburg On

__syncthreads() blocks the threads within a block. If you insist on using __syncthreads() you would have to ensure that the block size is the same as a(x*y), which is not necessarily possible, and even then, the order of the blocks isn't guaranteed. An alternative way of obtaining your goal, would be to launch a kernel per layer, and sync between kernels (i.e. block kernel concurrency). That of course really depends on what your exact kernel does and if breaking it up like that is feasible.