float a = 1.0 + ((float) (1 << 25))
float b = 1.0 + ((float) (1 << 26))
float c = 1.0 + ((float) (1 << 27))
What are the float values of a, b, and c after running this code? Explain why the bit layout of a, b, and c causes each value to be what it is.
When
int
is 32-bits, the below integer shifts are well defined and exact. Code is not shifting afloat
@EOF.Casts to
float
, the above power-of-2 values, are also well defined with no precision loss.Adding to those to a
double
1.0 are well defined exact sums. A typicaldouble
has a 53 bit significand and can represent0x8000001.0p0
exactly. e.g.:DBL_MANT_DIG == 53
Finally code attempts to assign
double
values to afloat
, while within the range of a typicalfloat
encoding, cannot represent the values exactly.A typical
float
has a 24 bit significand. e.g.:FLT_MANT_DIG == 24
A typical implementation-defined manner rounds to nearest, ties to even.
Output
The bit layout is not the issue. It is the property of the
float
withFLT_MANT_DIG == 24
, a 24-bit significand and implementation defined behavior, that results in the rounding of thedouble
value to the nearbyfloat
one. Anyfloat
layout withFLT_MANT_DIG == 24
would have like results.