Why the IEEE-754 exponent bias used in this C code is 126.94269504 instead of 127?

469 views Asked by At

The following C function is from fastapprox project.

static inline float 
fasterlog2 (float x)
{
  union { float f; uint32_t i; } vx = { x };
  float y = vx.i;
  y *= 1.1920928955078125e-7f;
  return y - 126.94269504f;
}

Could some experts here explain why the exponent bias used in the above code is 126.94269504 instead of 127? Is it more accurate bias value?

2

There are 2 answers

6
Mr. Llama On BEST ANSWER

In the project you linked, they included a Mathematica notebook with an explanation of their algorithms, which includes the "mysterious" -126.94269 value.
If you need a viewer, you can get one from the Mathematica website for free.

Edit: Since I'm feeling generous, here's the relevant section in screenshot form.

Simply put, they explain that the value is "simpler, faster, and less accurate".
They're not using -126.94269 in place of -127, they're using it in place of the result of the following calculation (values rounded for brevity):

-124.2255 - 1.498 * mx - (1.72588 / (0.35201 + mx))
3
Steve Summit On

Well, no, 126.94269504 is not a "more accurate" bias value. This code is doing something very, very strange; I'm pretty surprised it works at all. It takes the bits of a float as if they were an int (which in my experience usually gives you a totally garbage value, but maybe not), then takes that "garbage" int value and converts it back to a float, then does some math on it. This is, as they say, a fast and approximate way of doing something, in this case, taking the base-2 log. It shouldn't work at all, but the difference between 127 and 126.94269504 is evidently just one of several goofy fudge factors which are intended to salvage some meaning from what ought to be meaningless code. (Sort of a "two almost wrongs make an almost-right" kind of thing.)

If you want to extract exactly the mantissa and exponent of a float (though this will neither be as fast or as approximate), the usual way to do it is with the frexpf function.