Sometimes, one wants to write a (small) CUDA device-side function which returns two values. In C, you would have that function take two out-parameters, e.g.:
__device__ void pair_maker(float x, float &out1, float& out2);
but in C++, the idiomatic way to write this is to return an std::pair (well, maybe an std::tuple, or a struct, but C++ tuples are clunky and a struct is not generic enough):
__device__ std::pair<float, float> pair_maker(float x);
My question: Can I trust NVCC (with --expt-relaxed-constexpr) to optimize-away the construction of the pointer, and just assign directly to the variables which I later assign to from the .first and .second elements of the pair?
I don't have a complete answer, but from my limited experience - it seems that NVCC can optimize the
std::pairaway. Illustration (also on GodBolt):The kernels
bar()andbaz()compile to the same PTX code:No extra copies or construction-related operations.