I want to write cross-platform C/C++ which has reproducible behaviour across different environments.

I understand that gcc's ffast-math enables various floating-point approximations. This is fine, but I need two separately-compiled binaries to produce the same results.

Say I use gcc always, but variously for Windows, Linux, or whatever, and different compiler versions.

Is there any guarantee that these compilations will yield the same set of floating-point approximations for the same source code?

1 Answers

Peter Cordes On

No, it's not that they allow specific approximations, it's that -ffast-math allows compilers to assume that FP math is associative when it's not. i.e. ignore rounding error when transforming code to allow more efficient asm.

Any minor differences in choice of order of operations can affect the result by introducing different rounding.

Older compiler versions might choose to implement sqrt(x) as x * approx_rsqrt(x) with a Newton-Raphson iteration for -ffast-math, because older CPUs had a slower sqrtps instruction so it was more often worth it to replace it with an approximation of the reciprocal-sqrt + 3 or 4 more multiply and add instructions. This is generally not the case in most code for recent CPUs, so even if you use the same tuning options (especially the default -mtune=generic instead of -mtune=haswell for example), the choices that option makes can change between GCC versions.

It's hard enough to get deterministic FP without -ffast-math; different libraries on different OSes have different implementations of functions like sin and log (which unlike the basic ops + - * / sqrt are not required to return a "correctly rounded" result, i.e. max error 0.5ulp).

And extra precision for temporaries (FLT_EVAL_METHOD) can change the results if you compile for 32-bit x86 with x87 FP math. (-mfpmath=387 is the default for -m32). If you want to have any hope here, you'll want to avoid 32-bit x86. Or if you're stuck with it, maybe you can get away with -msse2 -mfpmath=sse...

You mentioned Windows, so I'm assuming you're only talking about x86 GNU/Linux, even though Linux runs on many other ISAs.

But even just within x86, compiling with -march=haswell enables use of FMA instructions, and GCC defaults to #pragma STDC FP_CONTRACT ON (even across C statements, beyond what the usual ISO C rules allow.) So actually even without -ffast-math, FMA availability can remove rounding for the x*y temporary in x*y + z.

With -ffast-math:

One version of gcc might decide to unroll a loop by 2 (and use 2 separate accumulators), when summing sum an array, while an older version of gcc with the same options might still sum in order.

(Actually current gcc is terrible at this, when it does unroll (not by default) it often still uses the same (vector) accumulator so it doesn't hide FP latency the way clang does. e.g. https://godbolt.org/z/X6DTxK uses different registers for the same variable, but it's still just one accumulator, no vertical addition after the sum loop. But hopefully future gcc versions will be better. And differences between gcc versions in how they do a horizontal sum of a YMM or XMM register could introduce differences there when auto-vectorizing)