Can we replace XOR with multiply-add?

198 views Asked by At

I'm working on a CUDA program where ALU is fully utilized (almost 100% compute throughput). The program does a lot of XOR operations, among others. Is it possible to offload the XOR to the floating-point engine? As far as I know, IMAD instructions are not executed in the ALU, but rather in the FPU. In other words, can we replace a XOR b with something like a*c + b (where c is some magic constant) or even 2-3 IMAD (integer multiply-add) instructions?

UPDATE: in response to the comments, a and b are 32-bit integers.

0

There are 0 answers