Long long int makes my Sieve of Eratosthenes super slow?

337 views Asked by At

I have a program that requires me to find primes up till 10**10-1 (10,000,000,000). I wrote a Sieve of Eratosthenes to do this, and it worked very well (and accurately) as high as 10**9 (1,000,000,000). I confirmed its accuracy by having it count the number of primes it found, and it matched the value of 50,847,534 on the chart found here. I used unsigned int as the storage type and it successfully found all the primes in approximately 30 seconds.

However, 10**10 requires that I use a larger storage type: long long int. Once I switched to this, the program is running signifigantly slower (its been 3 hours plus and its still working). Here is the relevant code:

typedef unsigned long long ul_long;
typedef unsigned int u_int;

ul_long max = 10000000000;                            
u_int blocks = 1250000000;
char memField[1250000000];     

char mapBit(char place) {             //convert 0->0x80, 1->0x40, 2->0x20, and so on
    return 0x80 >> (place);
}

for (u_int i = 2; i*i < max; i++) {

    if (memField[i / 8] & activeBit) {               //Use correct memory block
        for (ul_long n = 2 * i; n < max; n += i) {
            char secondaryBit = mapBit(n % 8);       //Determine bit position of n
            u_int activeByte = n / 8;                //Determine correct memory block
            if (n < 8) {                             //Manual override memory block and bit for first block
                secondaryBit = mapBit(n);
                activeByte = 0;
            }
            memField[activeByte] &= ~(secondaryBit);  //Set the flag to false
        }
    }
    activeBit = activeBit >> 1;                       //Check the next
    if (activeBit == 0x00) activeBit = 0x80;
} 

I figure that since 10**10 is 10x larger then 10**9 it should take 10 times the amount of time. Where is the flaw in this? Why did changing to long long cause such significant performance issues and how can I fix this? I recognize that the numbers get larger, so it should be somewhat slower, but only towards the end. Is there something I'm missing.

Note: I realize long int should technically be large enough but my limits.h says it isn't even though I'm compiling 64 bit. Thats why I use long long int in case anyone was wondering. Also, keep in mind, I have no computer science training, just a hobbyist.

edit: just ran it in "Release" as x86-64 with some of the debug statements suggested. I got the following output:

enter image description here

looks like I hit the u_int bound. I don't know why i is getting that large.

1

There are 1 answers

11
chqrlie On BEST ANSWER

Your program has an infinite loop in for (u_int i = 2; i*i < max; i++). i is an unsigned int so i*i wraps at 32-bit and is always less than max. Make i an ul_long.

Note that you should use simpler bit pattern from 1 to 0x80 for bit 0 to 7.

Here is a complete version:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef unsigned long long ul_long;
typedef unsigned int u_int;

#define TESTBIT(a, bit)   (a[(bit) / 8] & (1 << ((bit) & 7)))
#define CLEARBIT(a, bit)  (a[(bit) / 8] &= ~(1 << ((bit) & 7)))

ul_long count_primes(ul_long max) {
    size_t blocks = (max + 7) / 8;
    unsigned char *memField = malloc(blocks);
    if (memField == NULL) {
        printf("cannot allocate memory for %llu bytes\n",
               (unsigned long long)blocks);
        return 0;
    }
    memset(memField, 255, blocks);
    CLEARBIT(memField, 0);  // 0 is not prime
    CLEARBIT(memField, 1);  // 1 is not prime
    // clear bits after max
    for (ul_long i = max + 1; i < blocks * 8ULL; i++) {
        CLEARBIT(memField, i);
    }

    for (ul_long i = 2; i * i < max; i++) {
        if (TESTBIT(memField, i)) {           //Check if i is prime
            for (ul_long n = 2 * i; n < max; n += i) {
                CLEARBIT(memField, n);                   //Reset all multiples of i
            }
        }
    }
    unsigned int bitCount[256];
    for (int i = 0; i < 256; i++) {
        bitCount[i] = (((i >> 0) & 1) + ((i >> 1) & 1) +
                       ((i >> 2) & 1) + ((i >> 3) & 1) +
                       ((i >> 4) & 1) + ((i >> 5) & 1) +
                       ((i >> 6) & 1) + ((i >> 7) & 1));
    }
    ul_long count = 0;
    for (size_t i = 0; i < blocks; i++) {
        count += bitCount[memField[i]];
    }
    printf("count of primes up to %llu: %llu\n", max, count);
    free(memField);
    return count;
}

int main(int argc, char *argv[]) {
    if (argc > 1) {
        for (int i = 1; i < argc; i++) {
            count_primes(strtoull(argv[i], NULL, 0));
        }
    } else {
        count_primes(10000000000);
    }
    return 0;
}

It completes in 10 seconds for 10^9 and 131 seconds for 10^10:

count of primes up to 1000000000: 50847534
count of primes up to 10000000000: 455052511