float a = 1.0 + ((float) (1 << 25))
float b = 1.0 + ((float) (1 << 26))
float c = 1.0 + ((float) (1 << 27))
What are the float values of a, b, and c after running this code? Explain why the bit layout of a, b, and c causes each value to be what it is.
When
intis 32-bits, the below integer shifts are well defined and exact. Code is not shifting afloat@EOF.Casts to
float, the above power-of-2 values, are also well defined with no precision loss.Adding to those to a
double1.0 are well defined exact sums. A typicaldoublehas a 53 bit significand and can represent0x8000001.0p0exactly. e.g.:DBL_MANT_DIG == 53Finally code attempts to assign
doublevalues to afloat, while within the range of a typicalfloatencoding, cannot represent the values exactly.A typical
floathas a 24 bit significand. e.g.:FLT_MANT_DIG == 24A typical implementation-defined manner rounds to nearest, ties to even.
Output
The bit layout is not the issue. It is the property of the
floatwithFLT_MANT_DIG == 24, a 24-bit significand and implementation defined behavior, that results in the rounding of thedoublevalue to the nearbyfloatone. Anyfloatlayout withFLT_MANT_DIG == 24would have like results.