error bound in function approximation algorithm

245 views Asked by At

Suppose we have the set of floating point number with "m" bit mantissa and "e" bits for exponent. Suppose more over we want to approximate a function "f".

From the theory we know that usually a "range reduced function" is used and then from such function we derive the global function value.

For example let x = (sx,ex,mx) (sign exp and mantissa) then... log2(x) = ex + log2(1.mx) so basically the range reduced function is "log2(1.mx)".

I have implemented at present reciprocal, square root, log2 and exp2, recently i've started to work with the trigonometric functions. But i was wandering if given a global error bound (ulp error especially) it is possible to derive an error bound for the range reduced function, is there some study about this kind of problem? Speaking of the log2(x) (as example) i would lke to be able to say...

"ok i want log2(x) with k ulp error, to achieve this given our floating point system we need to approximate log2(1.mx) with p ulp error"

Remember that as i said we know we are working with floating point number, but the format is generic, so it could be the classic F32, but even for example e=10, m = 8 end so on.

I can't actually find any reference that shows such kind of study. Reference i have (i.e. muller book) doesn't treat the topic in this way so i was looking for some kind of paper or similar. Do you know any reference?

I'm also trying to derive such bound by myself but it is not easy...

1

There are 1 answers

2
mcdowella On

There is a description of current practice, along with a proposed improvement and an error analysis, at https://hal.inria.fr/ensl-00086904/document. The description of current practice appears consistent with the overview at https://docs.oracle.com/cd/E37069_01/html/E39019/z4000ac119729.html, which is consistent with my memory of the most talked about problem being the mod pi range reduction of trigonometric functions.

I think IEEE floating point was a big step forwards just because it standardized things at a time when there were a variety of computer architectures, so lowering the risks of porting code between them, but the accuracy requirements implied by this may have been overkill: for many problems the constraint on the accuracy of the output is the accuracy of the input data, not the accuracy of the calculation of intermediate values.