accessing long double bit representation

1.6k views Asked by At

TLDR; Does the following code invoke undefined (or unspecified) behaviour ?

#include <stdio.h>
#include <string.h>

void printme(void *c, size_t n)
{
  /* print n bytes in binary */
}

int main() {
  long double value1 = 0;
  long double value2 = 0;

  memset( (void*) &value1, 0x00, sizeof(long double));
  memset( (void*) &value2, 0x00, sizeof(long double));

  /* printf("value1: "); */
  /* printme(&value1, sizeof(long double)); */
  /* printf("value2: "); */
  /* printme(&value2, sizeof(long double)); */

  value1 = 0.0;
  value2 = 1.0;

  printf("value1: %Lf\n", value1);
  printme(&value1, sizeof(long double));
  printf("value2: %Lf\n", value2);
  printme(&value2, sizeof(long double));

  return 0;
}

On my x86-64 machine, the output depends on the specific optimization flags passed to the compiler (gcc-4.8.0, -O0 vs -O1).

With -O0, I get

value1: 0.000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
value2: 1.000000
00000000 00000000 00000000 00000000 00000000 00000000 00111111 11111111
10000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

While with -O1, I get

value1: 0.000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
value2: 1.000000
00000000 00000000 00000000 00000000 00000000 01000000 00111111 11111111
10000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 

Please note the extra 1 in the second last line. Also, uncommenting the print instructions after the memset makes that 1 disappear. This seems to rely on two facts:

  1. long double is padded, i.e., sizeof(long double) = 16 but only 10 bytes are used.
  2. the call to memset might be optimized away
  3. the padding bits of the long doubles might change without notice, i.e. floating point operations on value1 and value2 seems to scramble the padding bits.

I'm compiling with -std=c99 -Wall -Wextra -Wpedantic and get no warnings so I'm not sure this is a case of strict aliasing violation (but it might well be). Passing -fno-strict-aliasing doesn't change a thing.

The context is a bug found in HDF5 library described here. HDF5 does a some bit fiddling to figure out the native bit representation of floating point types, but it gets confused if the padding bits do not stay zero.

So:

  1. Is this undefined behaviour?
  2. Is this a strict aliasing violation?

Thanks.

edit: This is the code for printme. I admit I had just cut&pasted from somewhere without paying too much attention to it. If the fault is in here I'll go around the table with pants down.

void printme(void *c, size_t n)
{
  unsigned char *t = c;
  if (c == NULL)
    return;
  while (n > 0) {
    int q;
    --n;
    for(q = 0x80; q; q >>= 1) 
      printf("%x", !!(t[n] & q));
    printf(" ");
  }
  printf("\n");
}
3

There are 3 answers

10
Pascal Cuoq On BEST ANSWER

Is this undefined behaviour?

Yes. The padding bits are indeterminate(*). Accessing indeterminate memory might as well be undefined behavior (it was undefined behavior in C90 and some C99 compilers treat it as undefined behavior. Also the C99 rationale says that accessing indeterminate memory is intended to be undefined behavior. But the C99 standard itself does not say it so clearly, it only alludes to trap representations and may give the impression that if one knows one does not have trap representations, one can obtain unspecified values from indeterminate memory). The padding part of the long double is at the very least unspecified.

(*) C99's footnote 271 says “The contents of ‘‘holes’’ used as padding for purposes of alignment within structure objects are indeterminate.” The text earlier refers to unspecified bytes, but that's only because bytes do not have trap representations.

Is this a strict aliasing violation?

I do not see any strict aliasing violation in your code.

3
R.. GitHub STOP HELPING ICE On

While the C standard allows operations to clobber the padding bits, I don't think this is what's happening on your system. Rather, they're never being initialized to begin with, and GCC is simply optimizing out the memset at -O1, since the object is subsequently overwritten. This could probably be suppressed with -fno-builtin-memset.

3
Lee Daniel Crocker On

I don't see anything undefined here, or even unspecified (two very different things). Yes, the memset() calls are optimized out. On my machine (i86-32) long double is 12 bytes, padded to 16 in structs and on the stack. On your machine, they are clearly full 16 bytes, since sizeof(long double) is returning 16. Neither of the "printme" outputs resemble proper IEEE 128-bit floating point format, so I suspect there are other bugs in the printme() function that aren't shown here.