How can I detect lost of precision due to rounding in both floating point addition and multiplication?

Question

How can I detect lost of precision due to rounding in both floating point addition and multiplication?

401 views Asked by Tim At 10 October 2020 at 17:43

From Computer Systems: a Programmer's Perspective:

With single-precision ﬂoating point

the expression (3.14f+1e10f)-1e10f evaluates to 0.0: the value 3.14 is lost due to rounding.

the expression (1e20f*1e20f)*1e-20f evaluates to +∞ , while 1e20f*(1e20f*1e-20f) evaluates to 1e20f.

How can I detect lost of precision due to rounding in both floating point addition and multiplication?
What is the relation and difference between underflow and the problem that I described? Is underflow only a special case of lost of precision due to rounding, where a result is rounded to zero?

Thanks.

Original Q&A

There are 1 answers

**Bob__** · Accepted Answer · 2020-10-10T19:15:53+00:00

While in mathematics, addition and multiplication of real numbers are associative operations, those operations are not associative when performed on floating point types, like float, due to the limited precision and range extension.

So the order matters.

Considering the examples, the number 10000000003.14 can't be exactly represented as a 32-bit float, so the result of (3.14f + 1e10f) would be equal to 1e10f, which is the closest representable number. Of course, 3.14f + (1e10f - 1e10f) would yeld 3.14f instead.

Note that I used the f postfix, because in C the expression (3.14+1e10)-1e10 involves double literals, so that the result would be indeed 3.14 (or more likely something like 3.14999).

Something similar happens in the second example, where 1e20f * 1e20f is already beyond the range of float (but not of double) and the succesive multiplication is meaningless, while (1e20f * 1e-20f), which is performed first in the other expression, has a well defined result (1) and the successive multiplication yelds the correct answer.

In practice, there are some precautions you adopt

Use a wider type. double is a best fit for most applications, unless there are other requirements.
Reorder the operations, if possible. For example, if you have to add many terms and you know that some of them are smaller than others, start adding those, then the others. Avoid subtraction of numbers of the same order of magnitude. In general, there may be a more accurate way to evaluate an algebraic expression than the naive one (e.g. Horner's method for polynomial evaluation).
If you have some sort of knowledge of the problem domain, you may already know which part of the computation may have problematic values and check if those are greater (or lower) than some limits, before performing the calculation.
Check the results as soon as possible. There's no point in continuing a calculation when you already have an infinite value or a NaN, or keep iterating when your target value isn't modified at all.

TechQA.

How can I detect lost of precision due to rounding in both floating point addition and multiplication?

There are 1 answers

Related Questions in C

Related Questions in FLOATING-POINT

Related Questions in ROUNDING

Related Questions in NUMERICAL-METHODS

Related Questions in UNDERFLOW

Popular Questions

Popular Tags

Trending Questions