How to divide a nonnegative variable integer by a constant fixed-point rational ≥1 without overflow (at all, if possible)?

219 views Asked by At

How to compute ⌊÷⌋ for a variable integer and a fixed-point rational number ≥1 precisely? The input ∈ [0, INT_MAX] is stored in a variable of type int, ∈ [1, INT_MAX] is given as a decimal literal with a decimal separator in the form . (you may alternatively represent it as a fraction, say, /100, if this helps, or as a product of prime powers divided by another product of prime powers), and the output should be a value of type int. Here, ÷ is mathematical division. Our running example is ⌊÷65781.76⌋; within this example, you can assume that int is at least 3 bytes wide (as otherwise the result would be trivially 0). How to do this in C properly?

Clearly, (int)(/65781.76f) would involve floating-point arithmetic (and, according to http://www.h-schmidt.net/FloatConverter/IEEE754.html, 65781.76, as most constants, is not precisely representable at least on my machine), whereas 100*/6578176L and 25*/1644544L risk an overflow too much. In any case, to my knowledge, adding the suffix L or LL (as in, e.g., 25LL*/1644544L) is pointless because the types long long, long, and int are not guaranteed to be of different widths. Any other options? I'm interested in solutions involving integer arithmetic only and (most importantly) a proper explanation.

Ideally, we'd like to have a precise and portable solution; if possible, let's stick to C89/C90 without external libraries.

4

There are 4 answers

2
Eric Postpischil On BEST ANSWER

Specific Number

For the specific number in the question, we want ⌊/65781.76⌋ = ⌊•25/(2048•803)⌋ = ⌊•(16+8+1)/2048 / 803⌋ = ⌊(/128 + /256 + /2048) / 803⌋.

Next, we separate the integer and fraction portions of /128, /256, and /2048: ⌊(⌊/128⌋ + %128/128 + ⌊/256⌋ + %256/256 + ⌊/2048⌋ + x%2048/2048) / 803⌋.

Then we group the fractions, using “%” to indicate the remainder operation: ⌊(⌊/128⌋ + ⌊/256⌋ + ⌊/2048⌋ + %128/128 + %256/256 + x%2048/2048) / 803⌋.

And consolidate them into one term: ⌊(⌊/128⌋ + ⌊/256⌋ + ⌊/2048⌋ + (%128•16 + %256•8 + x%2048)/2048) / 803⌋.

Now the only fraction in the sum is in (%128•16 + %256•8 + x%2048)/2048, so we may discard it, taking its integer portion: ⌊(⌊/128⌋ + ⌊/256⌋ + ⌊/2048⌋ + ⌊(%128•16 + %256•8 + x%2048)/2048⌋) / 803⌋.

So all parts may be calculated with integer arithmetic: (x/128 + x/256 + x/2048 + (x % 128 * 16 + x % 256 * 8 + x % 2048)/2048) / 803.

Observe that all multiplications, remainders, and divisions except the division by 803 may be computed with bit shifts and masks.

General Cases

The above relies on the fact that the quotient in question had a power of two in its factorization. In general, given p and q with p•(q−1) ≤ INT_MAX, we can compute ⌊xp/q⌋ with x/q*p + x%q*p/q, because ⌊xp/q⌋ = ⌊x/qp⌋ = ⌊(⌊x/q⌋+x/q−⌊x/q⌋)•p⌋ = ⌊⌊x/q⌋•p + (x/q−⌊x/q⌋)•p⌋ = ⌊⌊x/q⌋•p + (x%q/q⌋)•p⌋ = ⌊x/q⌋•p + ⌊x%q/qp⌋ = ⌊x/q⌋•p + ⌊x%qp/q⌋.

Thus, for the number in question, we seek ⌊x•25/1,644,544⌋, so we can use x / 1644544 * 25 + x % 1644544 * 25 / 1644544.

This uses two divisions, one to compute the quotient and remainder of x when divided by q and then a second division by q.

0
Ted Lyngmo On

Here's one approach going from your original integers:

            x
--------------------------
              7227
      65536 * ----
               100
      ------------
           72
           72x
--------------------------
              7227
      65536 * ----
               100
           7200x                          25x
-------------------------- => --------------------------
        473628672                       1644544
if(x <= INT_MAX / 25)                   // no overflow will happen
    res = 25 * x / 1644544;
else                                    // overflow would happen
    res = x / (1644544 / 25);

Of course, the precision in the cases multiplying x with 25 would cause an overflow will suffer, so you could consider not dividing by 1644544 / 25 (65781) in those special cases but you actually divide by 1644544.0 / 25.0 (65781.76).

For the normal, non-overflowing cases, it will be precice.

Using a 64 bit integer will give you the range [0, 737869762948382064) and if your int is 32 bit (which it most probably is), it will cover that full range without resorting to the backup solution:

unsigned long long calc(unsigned long long x) {
    if(x <= ULLONG_MAX / 25ull)
        return 25ull * x / 1644544ull;
    else // backup
        return (long double)x / (1644544.0L / 25.0L);
}
0
John Bollinger On

How to divide a nonnegative variable integer by a positive constant float without risking too much of an overflow?

You're overthinking it. The way to do this job is by using ordinary FP division. Your objection to that seems to be on the grounds that your divisor is not exactly representable in binary floating point, but that does not necessarily imply that incorrect results will be produced.

Indeed, if you use type double instead of type float (and assume IEEE 754 arithmetic, which is pretty safe) then your divisor has has 53 bits of mantissa, all significant, and your int dividend has at most 31 bits of precision, with as few as just one significant (leading zeroes are not significant). Moreover, given the magnitude of your dividend, your truncated quotient will have at most 15 significant bits, with a whopping 38 bits of fraction having been lopped off.

Your dividend will be exactly represented, and your divisor will be represented with an error not exceeding 0.5 ulp. The result of the division will be within 0.5 ulp of the exact mathematical result of dividing by the rounded-to-within-0.5-ulp divisor, which will be well within 1 ulp of the mathematical result of the desired exact division.

You are then truncating at least 38 bits of fraction. The maximum error (1 bit) in the FP quotient relative to the mathematical quotient can make a difference in the truncated result only if the computed fractional part has either all bits 0 or all bits 1, and the error is (i) of sufficient magnitude (ii) in the wrong direction. That's less than 1 case in 238 if all fractional parts are equally likely, compared to only 231 - 1 positive values representable by a 32-bit int.

That doesn't prove that all ints will in fact yield correct results, but it's sufficiently encouraging to just test. There are only around 2 billion cases for your target data type (int; assuming 32 bits), and computers do arithmetic really fast. For example:

#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>

int main(void) {
    int64_t int_scale = 6578176;
    double float_scale = 65781.76;
    int count = 0;

    puts("Testing ...");
    for (int64_t i = 1; i <= INT32_MAX; i++) {
        int32_t int_result = (100 * i) / int_scale;
        int32_t flt_result = i / float_scale;

        if (int_result != flt_result) {
            printf("Difference at %" PRId64 "\n", i);
            count++;
        }
    }

    printf("%d differences\n", count);
}

Result:

$ time ./divtest
Testing ...
0 differences

real    0m2.198s
user    0m2.196s
sys     0m0.000s

So although you can write a complicated formula for calculating the result using only integer arithmetic, it will not give you different results for operands within the range you specified, and it will be harder to read and understand. So again, just use FP division.

0
chux - Reinstate Monica On

Key idea: quotient = x * c_power_of-2 / c_significand


When int is 32-bit or less:

Given c >= 1, finite float values c are all representable by some_integer/some_power_of_2 or some_integer/(1ul << n).

Review frexpf() to extract the fraction and exponent of c.

float c = ...;
int n;
float frac = frexpf(c, &n);
// Scale the fraction to an integer and adjust the exponent.
long long some_integer = frac * (1uLL << FLT_MANT_DIG);
n -= FLT_MANT_DIG;

If c is a constant, than let code use constants for n and some_integer.


Some sample code with tests.
Certainly not exhaustive, but a start.

#include <stdbool.h>
#include <stdint.h>
#include <assert.h>
#include <stdlib.h>
#include <float.h>
#include <math.h>
#include <stdio.h>

int int_div_float(int x, float y) {
  assert(x > 0);
  assert(y >= 1.0f);
  assert(x <= 0x7FFFFFFF);

  int y_expo;
  float y_frac = frexpf(y, &y_expo);
  y_expo -= FLT_MANT_DIG;
  y_frac *= 1uLL << FLT_MANT_DIG;
  assert(y_frac == (long long ) y_frac);
  long long y_frac_scaled = (long long) y_frac;
  if (y_expo <= 0) {
    long long x_wide = (long long) x << -y_expo;
    return (int) (x_wide / y_frac_scaled);
  }
  if (y_expo >= 32) {
    x = 0;
  } else {
    x >>= y_expo;
  }
  return (int) (x / y_frac_scaled);
}

bool int_div_float_test(int x, float y) {
  int q0 = (int) ((double) x / y);
  int q1 = int_div_float(x, y);
  if (q0 != q1) {
    printf("x:%10d y:%-15.9g q0:%10d  q1:%10d\n", x, y, q0, q1);
  }
  return q0 != q1;
}

int rand_int(void) {
  union {
    unsigned u;
    int i;
  } x = {0};
  for (unsigned i = 0; i < sizeof x; i++) {
    x.u <<= 8;
    x.u ^= (unsigned) rand();
  }
  return x.i;
}

float rand_float(void) {
  union {
    unsigned u;
    float f;
  } x = {0};
  for (unsigned i = 0; i < sizeof x; i++) {
    x.u <<= 8;
    x.u ^= (unsigned) rand();
  }
  return x.f;
}

unsigned int_div_float_tests(unsigned n) {
  unsigned error = 0;
  while (n-- > 0) {
    int x;
    do {
      x = rand_int();
    } while (x < 0);
    float y;
    do {
      y = rand_float();
    } while (!isfinite(y) || (y < 1.0f));
    error += int_div_float_test(x, y);
  }
  return error;
}

int main(void) {
  printf("Errors: %u\n", int_div_float_tests(100000*1000));
}

Output

Errors: 0