I'm working on a CUDA program where ALU is fully utilized (almost 100% compute throughput). The program does a lot of XOR operations, among others. Is it possible to offload the XOR to the floating-point engine? As far as I know, IMAD instructions are not executed in the ALU, but rather in the FPU. In other words, can we replace a XOR b with something like a*c + b  (where c is some magic constant) or even 2-3 IMAD (integer multiply-add) instructions?
UPDATE: in response to the comments, a and b are 32-bit integers.