Suppose I have an active CUDA context associated with device i
, and I now call cudaSetDevice(i)
. What happens? :
- Nothing?
- Primary context replaces the top of the stack?
- Primary context is pushed onto the stack?
It actually seems to be inconsistent. I've written this program, running on a machine with a single device:
#include <cuda.h>
#include <cuda_runtime_api.h>
#include <cassert>
#include <iostream>
int main()
{
CUcontext ctx1, primary;
cuInit(0);
auto status = cuCtxCreate(&ctx1, 0, 0);
assert (status == (CUresult) cudaSuccess);
cuCtxPushCurrent(ctx1);
status = cudaSetDevice(0);
assert (status == cudaSuccess);
void* ptr1;
void* ptr2;
cudaMalloc(&ptr1, 1024);
assert (status == cudaSuccess);
cuCtxGetCurrent(&primary);
assert (status == (CUresult) cudaSuccess);
assert(primary != ctx1);
status = cuCtxPushCurrent(ctx1);
assert (status == (CUresult) cudaSuccess);
cudaMalloc(&ptr2, 1024);
assert (status == (CUresult) cudaSuccess);
cudaSetDevice(0);
assert (status == (CUresult) cudaSuccess);
int i = 0;
while (true) {
status = cuCtxPopCurrent(&primary);
if (status != (CUresult) cudaSuccess) { break; }
std::cout << "Next context on stack (" << i++ << ") is " << (void*) primary << '\n';
}
}
and I get the following output:
context ctx1 is 0x563ec6225e30
primary context is 0x563ec61f5490
Next context on stack (0) is 0x563ec61f5490
Next context on stack (1) is 0x563ec61f5490
Next context on stack(2) is 0x563ec6225e3
This seems like the behavior is sometimes a replacement, and sometimes a push.
What's going on?
TL;DR: Based on the code you have provided, in both instances of your particular usage, it seems that
cudaSetDevice()
is replacing the context at the top of the stack.Let's modify your code a bit, and then see what we can infer about the effect of each API call in your code on the context stack:
Based on the above, as we proceed through each API call in your code:
1.
The context creation also pushes the newly created context on the stack, as mentioned here.
2.
No surprise, pushing the same context on the stack creates another stack entry for it.
3.
The
cudaSetDevice()
call has replaced the top of the stack with an "unknown" context. (Only unknown at this point because we have not retrieved the handle value of the "other" context).4.
No difference in stack configuration due to this call.
5.
No difference in stack configuration due to this call, but we now know that the top of stack context is the current context (and we can surmise it is the primary context).
6.
No real surprise here. We are pushing
ctx1
on the stack, and so the stack has 3 entries, the first one being the driver API created context, and the next two entries being the same as the stack configuration from step 5, just moved down one stack location.7.
Again, this call has no effect on stack configuration.
8.
Once again, we see that the behavior here is that the
cudaSetDevice()
call has replaced the top of stack context with the primary context.The conclusion I have from your test code is that I see no inconsistency of behavior of the
cudaSetDevice()
call when intermixed with various runtime and driver API calls as you have in your code.From my perspective, this sort of programming paradigm is insanity. I can't imagine why you would want to intermix driver API and runtime API code this way.