From Computer Systems: a Programmer's Perspective:
With single-precision floating point
the expression
(3.14f+1e10f)-1e10f
evaluates to 0.0: the value 3.14 is lost due to rounding.the expression
(1e20f*1e20f)*1e-20f
evaluates to +∞ , while1e20f*(1e20f*1e-20f)
evaluates to1e20f
.
How can I detect lost of precision due to rounding in both floating point addition and multiplication?
What is the relation and difference between underflow and the problem that I described? Is underflow only a special case of lost of precision due to rounding, where a result is rounded to zero?
Thanks.
While in mathematics, addition and multiplication of real numbers are associative operations, those operations are not associative when performed on floating point types, like
float
, due to the limited precision and range extension.So the order matters.
Considering the examples, the number 10000000003.14 can't be exactly represented as a 32-bit
float
, so the result of(3.14f + 1e10f)
would be equal to1e10f
, which is the closest representable number. Of course,3.14f + (1e10f - 1e10f)
would yeld3.14f
instead.Note that I used the
f
postfix, because in C the expression(3.14+1e10)-1e10
involvesdouble
literals, so that the result would be indeed3.14
(or more likely something like 3.14999).Something similar happens in the second example, where
1e20f * 1e20f
is already beyond the range offloat
(but not ofdouble
) and the succesive multiplication is meaningless, while(1e20f * 1e-20f)
, which is performed first in the other expression, has a well defined result (1) and the successive multiplication yelds the correct answer.In practice, there are some precautions you adopt
double
is a best fit for most applications, unless there are other requirements.