I have some work I want to do on a CUDA stream, say a kernel K
, which depends on previous work that needs to be done on the CPU. The exact details of the CPU work is not something that's known to me when I'm scheduling K
; I just want K
not to start until it is given an indication that everything is ready.
Now, if I had known exactly what CPU work is to be done, e.g. that K
could start after some function foo()
concludates, I could do the following:
- Enqueue a call to
foo()
on stream SideStream - Enqueue an event
E1
on SideStream - Enqueue a wait on event
E1
on MainStream - Enqueue
K
on MainStream
but - what my CUDA scheduling code doesn't have an access to such a foo()
? I want to allow some other, arbitrary place in my code to fire E1 when it is good and ready, and have that trigger K on MainStream. ... but I can't do that, since in CUDA, you can only wait on an already-enqueued (already "recorded") event.
This seems to be one of the few niches in which OpenCL offers a richer API than CUDA's: "User Events". They can be waited upon, and their execution completion status can be set by the user. See:
- https://registry.khronos.org/OpenCL/sdk/3.0/docs/man/html/clCreateUserEvent.html
- https://registry.khronos.org/OpenCL/sdk/3.0/docs/man/html/clSetUserEventStatus.html
But certainly CUDA is able to provide something like this itself, if only to implement the OpenCL API call. So, what is the idiomatic way to achieve this effect with CUDA?
Here's a possible idea - based on @AbatorAbetor's comment, although I have no idea if that's what people use in practice.
foo()
which takes a condition variable as a paramter and wait on the variable. You can usestd::condition_variable
for example.Now proceed as in your question - as you have exactly the function you were missing:
foo()
on stream SideStreamE1
on SideStreamE1
on MainStreamK
on MainStreambut you are not quite done: Your scheduler now passes the condition variable (while keeping it alive!) onwards/outwards, so that finally, the "CPU work" you mentioned has a reference to it. When it is done, all it needs to do is a notify operation on the condition variable: This will wake up
foo()
, then immediate triggerE
and thenK
.Caveat: I am assuming that letting a CUDA callback function block like this doesn't interfere with other CUDA runtime/driver work.