Strict aliasing, -ffast-math and SSE

2.3k views Asked by At

Consider the following program:

#include <iostream>
#include <cmath>
#include <cstring>
#include <xmmintrin.h>

using namespace std;

int main()
{
    // 4 float32s.
    __m128 nans;
    // Set them all to 0xffffffff which should be NaN.
    memset(&nans, 0xff, 4*4);

    // cmpord should return a mask of 0xffffffff for any non-NaNs, and 0x00000000 for NaNs.
    __m128 mask = _mm_cmpord_ps(nans, nans);
    // AND the mask with nans to zero any of the nans. The result should be 0x00000000 for every component.
    __m128 z = _mm_and_ps(mask, nans);

    cout << z[0] << " " << z[1] << " " << z[2] << " " << z[3] << endl;

    return 0;
}

If I compile with Apple Clang 7.0.2 with and without -ffast-math, I get the expected output 0 0 0 0:

$ clang --version
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin14.5.0
Thread model: posix

$ clang test.cpp -o test
$ ./test
0 0 0 0 

$ clang test.cpp -ffast-math -o test
$ ./test 
0 0 0 0

However after updating to 8.1.0 (sorry I have no idea which actual version of Clang this corresponds to - Apple no longer publish that information), -ffast-math seems to break this:

$ clang --version
Apple LLVM version 8.1.0 (clang-802.0.42)
Target: x86_64-apple-darwin16.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

$ clang test.cpp -o test
$ ./test
0 0 0 0 

$ clang test.cpp -ffast-math -o test
$ ./test 
nan nan nan nan

I suspect this is because of strict aliasing rules or something like that. Can anyone explain this behaviour?

Edit: I forgot to mention that if you do nans = { std::nanf(nullptr), ... it works fine.

Also looking on godbolt it seems that the behaviour changed between Clang 3.8.1 and Clang 3.9 - the latter removes the cmpordps instruction. GCC 7.1 seems to leave it in.

1

There are 1 answers

0
Cornstalks On BEST ANSWER

This isn't a strict aliasing issue. If you read the documentation of -ffast-math, you'll see your issue:

Enable fast-math mode. This defines the __FAST_MATH__ preprocessor macro, and lets the compiler make aggressive, potentially-lossy assumptions about floating-point math. These include:

  • [...]
  • operands to floating-point operations are not equal to NaN and Inf, and
  • [...]

-ffast-math allows the compiler to assume that a floating point number is never NaN (because it sets the -ffinite-math-only option). Since clang tries to match gcc's options, we can read a little from GCC's option documentation to better understand what -ffinite-math-only does:

Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs.

This option should never be turned on by any -O option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications.

So if your code needs to work with NaN, you can't use -ffast-math or -ffinite-math-only. Otherwise you run the risk of the optimizer destroying your code, as you're seeing here.