Unsigned 128-bit division on 64-bit machine

9.6k views Asked by At

I have a 128-bit number stored as 2 64-bit numbers ("Hi" and "Lo"). I need only to divide it by a 32-bit number. How could I do it, using the native 64-bit operations from CPU?

(Please, note that I DO NOT need an arbitrary precision library. Just need to know how to make this simple division using native operations. Thank you).

4

There are 4 answers

1
F.D.Castel On BEST ANSWER

If you are storing the value (128-bits) using the largest possible native representation your architecture can handle (64-bits) you will have problems handling the intermediate results of the division (as you already found :) ).

But you always can use a SMALLER representation. What about FOUR numbers of 32-bits? This way you could use the native 64-bits operations without overflow problems.

A simple implementation (in Delphi) can be found here.

7
Elmue On

I have a DECIMAL structure which consists of three 32-bit values: Lo32, Mid32 and Hi32 = 96 bit totally.

You can easily expand my C code for 128-bit, 256-bit, 512-bit or even 1024-bit division.

// in-place divide Dividend / Divisor including previous rest and returning new rest
static void Divide32(DWORD* pu32_Dividend, DWORD u32_Divisor, DWORD* pu32_Rest)
{
    ULONGLONG u64_Dividend = *pu32_Rest;
    u64_Dividend <<= 32;
    u64_Dividend |= *pu32_Dividend;

    *pu32_Dividend = (DWORD)(u64_Dividend / u32_Divisor);
    *pu32_Rest     = (DWORD)(u64_Dividend % u32_Divisor);
}

// in-place divide 96 bit DECIMAL structure
static bool DivideByDword(DECIMAL* pk_Decimal, DWORD u32_Divisor)
{
    if (u32_Divisor == 0)
        return false;

    if (u32_Divisor > 1)
    {
        DWORD u32_Rest = 0;
        Divide32(&pk_Decimal->Hi32,  u32_Divisor, &u32_Rest); // Hi FIRST!
        Divide32(&pk_Decimal->Mid32, u32_Divisor, &u32_Rest);
        Divide32(&pk_Decimal->Lo32,  u32_Divisor, &u32_Rest);
    }
    return true;
}
1
phuclv On

How could I do it, using the native 64-bit operations from CPU?

Since you want native operations, you'll have to use some built-in types or intrinsic functions. All the above answers will only give you general C solutions which won't be compiled to the division instruction

Most modern 64-bit compilers have some ways to do a 128-by-64 division. In MSVC use _div128() and _udiv128() so you'll just need to call _udiv128(hi, lo, divisor, &remainder)

The _div128 intrinsic divides a 128-bit integer by a 64-bit integer. The return value holds the quotient, and the intrinsic returns the remainder through a pointer parameter. _div128 is Microsoft specific.

_(u)div128 maps directly to the CPU's narrowing (i)div instruction so it'll fault on overflow like the instruction's behavior itself. Therefore you'll need to check for overflow before calling


In Clang, GCC and ICC there's an __int128 type and you can use that directly

unsigned __int128 div128by32(unsigned __int128 x, uint64_t y)
{
    return x/y;
}

This isn't a narrowing operation. The widths of result and all the operands are the same, hence overflow won't happen. Under the hood it'll check if it's a 128-to-64 division and use the (i)div instruction like above and only do a software division if the divisor is more than 64 bits

0
Thomas On

The subtitle to volume two of The Art of Computer Programming is Seminumerical Algorithms. It's appropriate, as the solution is fairly straight-forward when you think of the number as an equation instead of as a number.

Think of the number as Hx + L, where x is 264. If we are dividing by, call it Y, it is then trivially true that Hx = (N + M)x where N is divisible by Y and M is less than Y. Why would I do this? (Hx + L) / Y can now be expressed as (N / Y)x + (Mx + L) / Y. The values N, N / Y, and M are integers: N is just H / Y and M is H % Y However, as x is 264, this still brings us to a 128 by something divide, which will raise a hardware fault (as people have noted) should Y be 1.

So, what you can do is reformulate the problem as (Ax3 + Bx2 + Cx + D) / Y, with x being 232. You can now go down: (A / Y)x3 + (((A % Y)x + B) / Y)x2 + (((((A % Y)x + B) % Y)x + C) / Y)x + ((((((A % Y)x + B) % Y)x + C) / Y)x + D) / Y. If you only have 64 bit divides: you do four divides, and in the first three, you take the remainder and shift it up 32 bits and or in the next coefficient for the next division.

This is the math behind the solution that has already been given twice.