I have a high-precision ODE (ordinary differential equations) solver written on C++. I do all calculations with user-defined type real_type
. There is a typedef declaring this type in the header:
typedef long double real_type;
I decided to change long double type to __float128
for more accuracy. In addition to this I included quadmath.h
and replaced all standard math functions by ones from libquadmath.
If "long double" version is builded without any optimization flags, some reference ODE is solved in 77 seconds. If this version is builded with -O3 flag, the same ODE is solved in 25 seconds. Thus -O3 flag speeds up calculations in three times.
But in "__float 128" version builded without flags similar ODE is solved in 190 seconds, and with -O3 in 160 seconds (~ 15% difference). Why -O3 optimization does such a weak effect for quadruple precision calculations? Maybe I should use other compiler flags or include other libraries?
Compiler optimizations work like this: the compiler recognizes certain patterns in your code, and replaces them by equivalent, but faster versions. Without knowing exactly what your code looks like and what optimizations the compiler performs, we can't say what the compiler is missing.
It's likely that several optimizations that the compiler knows how to perform for native floating point types and their operations, it doesn't know to perform on __float128 and library implementations of the operations. It might not recognize these operations for what they are. Maybe it can't look into the library implementations (you should try compiling the library together with your program and enabling link-time optimization).