Does cudnnCreate() call create multiple streams internally?

402 views Asked by At

I am writing a simple multi-stream CUDA application. Following is the part of code where I create cuda-streams, cublas-handle and cudnn-handle:

cudaSetDevice(0);

int num_streams = 1;

cudaStream_t streams[num_streams];
cudnnHandle_t mCudnnHandle[num_streams];
cublasHandle_t mCublasHandle[num_streams];

for (int ii = 0; ii < num_streams; ii++) {
    cudaStreamCreateWithFlags(&streams[ii], cudaStreamNonBlocking);
    cublasCreate(&mCublasHandle[ii]);
    cublasSetStream(mCublasHandle[ii], streams[ii]);
    cudnnCreate(&mCudnnHandle[ii]);
    cudnnSetStream(mCudnnHandle[ii], streams[ii]);
}

Now, my stream count is 1. But when I profile the executable of above application using Nvidia Visual Profiler I get following:

enter image description here

For every stream I create it creates additional 4 more streams. I tested it with num_streams = 8, it showed 40 streams in profiler. It raised following questions in my mind:

  1. Does cudnn internally create streams? If yes, then why?
  2. If it implicitly creates streams then what is the way to utilize it?
  3. In such case does explicitly creating streams make any sense?
1

There are 1 answers

3
Robert Crovella On BEST ANSWER

Does cudnn internally create streams?

Yes.

If yes, then why?

Because it is a library, and it may need to organize CUDA concurrency. Streams are used to organize CUDA concurrency. If you want a detailed explanation of what exactly the streams are used for, the library internals are not documented.

If it implicitly creates streams then what is the way to utilize it?

Those streams are not intended for you to utilize separately/independently. They are for usage by the library, internal to the library routines.

In such case does explicitly creating streams make any sense?

You would still need to explicitly create any streams you needed to manage CUDA concurrency outside of the library usage.

I would like to point out that this statement is a bit misleading:

"For every stream I create it creates additional 4 more streams."

What you are doing is going through a loop, and at each loop iteration you are creating a new handle. Your observation is tied to the number of handles you create, not the number of streams you create.