I have tried to change the current device in CUDA graphs by creating this host node:
cudaGraph_t graph;
// Node #1: Create the 1st setDevice
cudaHostNodeParams hostNodeParams = {0};
memset(&hostNodeParams, 0, sizeof(hostNodeParams));
hostNodeParams.fn = [](void *data) {
int passed_device_ordinal = *(int *)(data);
cout << "CUDA-Graph: in the host node: changing the device to: "
<< passed_device_ordinal << endl;
CUDA_CHECK(cudaSetDevice(passed_device_ordinal));
};
hostNodeParams.userData = (void *)&device_1;
// Node #1: Add the 1st setDevice
CUDA_CHECK(cudaGraphAddHostNode(&setDevice_1, graph, ©_0to1, 1,
&hostNodeParams));
When running the code, I get this output:
CUDA-Graph: in the host node: changing the device to: 1
Error operation not permitted at line 68 in file src/MultiGPU.cu
Is it possible to change the device within a CUDA graph?
During the execution of a graph, the current device cannot be changed via a host callback, since callbacks are not allowed to make cuda api calls.
There are two ways to specify the device on which a kernel within the graph will execute.
Use stream-capture to create a multi-gpu graph.
When manually constructing the graph, nodes will be assigned to the currently active device. Use cudaSetDevice before adding your kernel.
The following code demonstrates both with a simple pipeline which executes (kernel, memcpy to host, host callback) on each gpu.