The behavior of stream 0 (default) and other streams

4.6k views Asked by At

In CUDA, how is stream 0 related to other streams? Does stream 0 (default stream) execute concurrently with other streams in a context or not?

Considering the following example:

cudaMemcpy(Dst, Src, sizeof(float)*datasize, cudaMemcpyHostToDevice);//stream 0;

cudaStream_t stream1;

/...creating stream1.../

somekernel<<<blocks, threads, 0, stream1>>>(Dst);//stream 1;

In the above code, can the compiler ensure somekernel always launches AFTER cudaMemcpy finishes or will somekernel execuate concurrently with cudaMemcpy?

1

There are 1 answers

1
talonmies On BEST ANSWER

cudaMemcpy call is (in all but a particular case) a synchronous call. The host thread running that code blocks until the memory transfer to the host. It cannot proceed to launch the kernel until the cudaMemcpy call has returned, it that doesn't happen until the copy operation is completed.

More generally, the default stream (0 or null) implicitly serializes operations on the GPU whenever an operation is active in that stream. If you create streams and push operations into them at the same time as an operation is being performed in default stream, all concurrency in those streams is lost until the default stream is idle.