Is there a bug in Cuda? I have run the following code on my GTX580 and r1 is zero at the end. I expect that it is one due to carry propagation? I have tested the code with Cuda Toolkit 4.2.9 and 5.5 and use "nvcc -arch=sm_20 bug.cu -o bug && ./bug" to compile and run it.
#include <stdio.h>
#include <cuda.h>
__global__ void bug()
{
unsigned int r1 = 0;
unsigned int r2 = 0;
asm( "\n\t"
"sub.cc.u32 %0, 0, 1;\n\t"
"addc.cc.u32 %1, 0, 0;\n\t"
: "=r"(r1), "=r"(r2) );
printf("r1 >> %04X\n", r1);
printf("r2 >> %04X\n", r2);
}
int main(void)
{
float *a_d;
cudaMalloc((void **) &a_d, 1);
bug <<< 1,1 >>> ();
cudaFree(a_d);
}
Output
r1 >> FFFFFFFF
r2 >> 0000
I think I have found an explanation. There is a note in the PTX manual which says for the sub.cc instruction: "Behavior is the same for unsigned and signed integers."