I've got four unsigned 32-bit integers representing an unsigned 128-bit integer, in little endian order:
typedef struct {
unsigned int part[4];
} bigint_t;
I'd like to convert this number into its decimal string representation and output it to a file.
Right now, I'm using a bigint_divmod10
function to divide the number by 10, keeping track of the remainder. I call this function repeatedly, outputting the remainder as a digit, until the number is zero. It's pretty slow. Is this the fastest way to do it? If so, is there a clever way to implement this function that I'm not seeing? I've tried looking at GMP's get_str.c
, but I find it pretty impenetrable.
EDIT: here's the fastest code I was able to come up with for the divmod10 function:
static unsigned uint128_divmod10(uint128 *value)
{
unsigned int a = value->word[3];
unsigned int b = value->word[2];
unsigned int c = value->word[1];
unsigned int d = value->word[0];
unsigned int diva = a / 5;
unsigned int divb = b / 5;
unsigned int divc = c / 5;
unsigned int divd = d / 5;
value->word[3] = diva;
value->word[2] = divb;
value->word[1] = divc;
value->word[0] = divd;
unsigned int moda = a - diva*5;
unsigned int modb = b - divb*5;
unsigned int modc = c - divc*5;
unsigned int modd = d - divd*5;
unsigned int mod = 0;
mod += moda;
unsigned int carryb = mod*858993459;
mod += modb;
if (mod >= 5) {
mod -= 5;
carryb++;
}
unsigned int carryc = mod*858993459;
mod += modc;
if (mod >= 5) {
mod -= 5;
carryc++;
}
unsigned int carryd = mod*858993459;
mod += modd;
if (mod >= 5) {
mod -= 5;
carryd++;
}
uint128_add(value, carryd, 0);
uint128_add(value, carryc, 1);
uint128_add(value, carryb, 2);
if (value->word[0] & 1) {
mod += 5;
}
uint128_shift(value, -1);
return mod;
}
where the add function is defined as:
static void uint128_add(uint128 *value, unsigned int k, unsigned int pos)
{
unsigned int a = value->word[pos];
value->word[pos] += k;
if (value->word[pos] < a) {
// overflow
for (int i=pos+1; i<4; i++) {
value->word[i]++;
if (value->word[i]) {
break;
}
}
}
}
It depends what else you're doing with the numbers. You can trade off a slight loss in space efficiency and a modest loss in efficiency of multiprecision arithmetic in return for very efficient conversion to and from decimal. The key is to do multiprecision arithmetic with a base that is a power of 10 rather than a power of 2.
For example, you might use base 10,000, where you pack one digit into a 16-bit word and you do your arithmetic on digits in 32-bit integers. (If you're on a 64-bit machine you can double that and do base 1,000,000,000.) This kind of code is relatively efficient timewise, although not quite as fast as using the native power of two because you can't take advantage of the carry bit on the hardware. And you can't represent as many integers in the same number of bits. But it's a whiz at converting to and from decimal, because you get to convert the individual digits without any long division.
If you need to represent the full range of numbers from zero to
((1 << 128) - 1)
, you can still do this, but add an extra digit, so your numbers will be bigger.If it turns out you really need the extra space/speed (maybe you're doing a lot of cryptographic 128-bit calculations) then the method of simultanous div/mod by 10 is the fastest method I know. The only other trick is that if small integers are common, you can handle them specially. (That is, if the three most significant 32-bit words are all zero, just use the native division to convert.)
Dave Hanson's C Interfaces and Implementations has a lengthy chapter on multiprecision arithmetic. Dividing a large number by a single digit is a special case that has this efficient implementation:
For full understanding, it really helps to have the book, but the source code is still a lot easier to understand than the GNU source code. And you could easily adapt it to use base 10,000 (it currently uses base 256).
Summary: if your performance bottleneck is conversion to decimal, implement multiprecision arithmetic with a base that is a power of 10. If your machine's native word size is 32 and you are using C code, use 10,000 in a 16-bit word.