HIP-Clang Inline Assembly

834 views Asked by At

What is the Hip-Clang equivalent of this CUDA function?

__device__ __forceinline__ uint32_t add_cc(uint32_t a, uint32_t b)
{
   uint32_t r;
   asm volatile ("add.cc.u32 %0, %1, %2;" : "=r"(r) : "r"(a), "r"(b));
   return r;
}

I'm porting a CUDA project to HIP-Clang that contains inline PTX assembly. The function is used to implement multi-precision addition in the NVIDIA GPU. I tried:

asm volatile ("add.cc.u32 %0, %1, %2;" : "=r"(r) : "r"(a), "r"(b)); //invalid instruction
asm volatile ("V_ADD_CO_U32 %0, %1, %2;" : "=r"(r) : "r"(a), "r"(b)); //invalid operand for instruction
asm volatile ("V_ADD_CO_U32 %0, %1, %2;" : "=v"(r) : "v"(a), "v"(b)); //operands are not valid for this GPU or mode

The target hardware is RX 6800. AMD clang version 14.0.0.
Is RDNA2 the correct instruction set reference?
Is this LLVM user guide to AMDGPU backend an applicable reference?

1

There are 1 answers

0
Andrew H On BEST ANSWER

It turns out the answer is hardware dependent. For my hardware for which the compiler defines __gfx1030__ the correct syntax is

asm volatile ("v_add_co_u32 %0, vcc_lo, %1, %2;" : "=v"(r) : "v"(a), "v"(b));

For earlier architechtures such as __gfx900__ replace vcc_lo with vcc
See the discussion on the Rocm Hip Github and this AMD gpu assembly reference.