What is the Hip-Clang equivalent of this CUDA function?
__device__ __forceinline__ uint32_t add_cc(uint32_t a, uint32_t b)
{
uint32_t r;
asm volatile ("add.cc.u32 %0, %1, %2;" : "=r"(r) : "r"(a), "r"(b));
return r;
}
I'm porting a CUDA project to HIP-Clang that contains inline PTX assembly. The function is used to implement multi-precision addition in the NVIDIA GPU. I tried:
asm volatile ("add.cc.u32 %0, %1, %2;" : "=r"(r) : "r"(a), "r"(b)); //invalid instruction
asm volatile ("V_ADD_CO_U32 %0, %1, %2;" : "=r"(r) : "r"(a), "r"(b)); //invalid operand for instruction
asm volatile ("V_ADD_CO_U32 %0, %1, %2;" : "=v"(r) : "v"(a), "v"(b)); //operands are not valid for this GPU or mode
The target hardware is RX 6800. AMD clang version 14.0.0.
Is RDNA2 the correct instruction set reference?
Is this LLVM user guide to AMDGPU backend an applicable reference?
It turns out the answer is hardware dependent. For my hardware for which the compiler defines
__gfx1030__
the correct syntax isFor earlier architechtures such as
__gfx900__
replacevcc_lo
withvcc
See the discussion on the Rocm Hip Github and this AMD gpu assembly reference.