The function below calculates absolute value of 32-bit floating point value:
__forceinline static float Abs(float x)
{
union {
float x;
int a;
} u;
//u.x = x;
u.a &= 0x7FFFFFFF;
return u.x;
}
union u declared in the function holds variable x, which is different from the x which is passed as parameter in the function. Is there any way to create a union with argument to the function - x?
Any reason the function above with uncommented line be executing longer than this one?
__forceinline float fastAbs(float a)
{
int b= *((int *)&a) & 0x7FFFFFFF;
return *((float *)(&b));
}
I'm trying to figure out best way to take Abs of floating point value in as little count of read/writes to memory as possible.
Looking at the disassembly of the code compiled in release mode the difference is quite clear! I removed the inline and used two virtual function to allow the compiler to not optimize too much and let us show the differences.
This is the first function.
This is the second function.
The number of floating point operations and the usage of FPU stack in the first case is greater. The functions are executing exactly what you asked, so no surprise. So i expect the second function to be faster.
Now... removing the virtual and inlining things are a little different, is hard to write the disassembly code here because of course the compiler does a good job, but i repeat, if values are not constants, the compiler will use more floating point operation in the first function. Of course, integer operations are faster than floating point operations.
Are you sure that directly using math.h abs function is slower than your method? If correctly inlined, abs function will just do this!
Micro-optimizations like this are hard to see in long code, but if your function is called in a long chain of floating point operations, fabs will work better since values will be already in FPU stack or in SSE registers! abs would be faster and better optimized by the compiler.
You cannot measure the performances of optimizations running a loop in a piece of code, you must see how the compiler mix all together in the real code.