I am having some confusions about how to use the cub::DeviceReduce::ArgMin(). Here I copy the code from the documentation of CUB.
#include <cub/cub.cuh>
// Declare, allocate, and initialize device-accessible pointers for input and output
int num_items; // e.g., 7
int *d_in; // e.g., [8, 6, 7, 5, 3, 0, 9], located in GPU
KeyValuePair<int, int> *d_out; // e.g., [{-,-}]
// Determine temporary device storage requirements
void *d_temp_storage = NULL;
size_t temp_storage_bytes = 0;
cub::DeviceReduce::ArgMin(d_temp_storage, temp_storage_bytes, d_in, d_out, num_items);
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Run argmin-reduction
cub::DeviceReduce::ArgMin(d_temp_storage, temp_storage_bytes, d_in, d_out, num_items);
// d_out <-- [{5, 0}]
And the questions are as follow:
- if the d_in is the pointer to some GPU memory (device), how to initialize the pointer of d_out?
- if the operation of ArgMin() is finished in the device (GPU), how can I copy the result to my CPU?
You use
cudaMalloc
, similar to how you would initialize thed_in
pointer.You use
cudaMemcpy
, similar to how you would copy thed_in
data from host to device, except now you are copying thed_out
data from device to host. The KeyValuePair is a C++ object that haskey
andvalue
members.Here is a complete example: