I don't know if this is possible or not, but I just gotta ask. My mathematical and algorithmic skills are kind of failing me here :P
The thing is I now have this class that generates prime numbers up to a certain limit:
public class Atkin : IEnumerable<ulong>
{
private readonly List<ulong> primes;
private readonly ulong limit;
public Atkin(ulong limit)
{
this.limit = limit;
primes = new List<ulong>();
}
private void FindPrimes()
{
var isPrime = new bool[limit + 1];
var sqrt = Math.Sqrt(limit);
for (ulong x = 1; x <= sqrt; x++)
for (ulong y = 1; y <= sqrt; y++)
{
var n = 4*x*x + y*y;
if (n <= limit && (n % 12 == 1 || n % 12 == 5))
isPrime[n] ^= true;
n = 3*x*x + y*y;
if (n <= limit && n % 12 == 7)
isPrime[n] ^= true;
n = 3*x*x - y*y;
if (x > y && n <= limit && n % 12 == 11)
isPrime[n] ^= true;
}
for (ulong n = 5; n <= sqrt; n++)
if (isPrime[n])
{
var s = n * n;
for (ulong k = s; k <= limit; k += s)
isPrime[k] = false;
}
primes.Add(2);
primes.Add(3);
for (ulong n = 5; n <= limit; n++)
if (isPrime[n])
primes.Add(n);
}
public IEnumerator<ulong> GetEnumerator()
{
if (!primes.Any())
FindPrimes();
foreach (var p in primes)
yield return p;
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
Now, what I would like is to get rid of the limit so that the sequence would never stop (theoretically).
I am thinking it could go something like this:
- Start with some trivial limit
- Find all the primes up to the limit
- Yield all the newfound primes
- Increase the limit (by doubling or squaring the old limit or something like that)
- Goto step 2
And optimally that step two should only have to work between the old limit and the new one. In other words it shouldn't have to find the lowest primes again and again.
Is there a way this can be done? My main problem is that I don't quite understand what x
and y
for example is in this algorithm. Like, could I just use the same algorithm kind of but set x
and y
to oldLimit
(initially 1) and run it up to newLimit
? Or how would that work? Any bright minds with some light to shed on this?
The point of this is so that I won't have to set that limit. So that I can for example use Linq and just Take()
however many primes I need, not worrying about if the limit is high enough, et cetera.
The following code does the optimizations as discussed at the bottom of my previous answer and includes the following features:
The base primes representation has been reduced to one byte per base prime for a further reduction in memory footprint; thus, the total memory footprint other than the code is the array to hold this base primes representation for the primes up to the square root of the current range being processed, and the packed bit page buffers which are currently set at under the L2 cache size of 256 Kilobytes (smallest page size of 14,586 bytes times the CHNKSZ of 17 as supplied) each per CPU core plus one extra buffer for the foreground task to process. With this code, about three Megabytes is sufficient to process the prime range up to ten to the fourteenth power. As well as speed due to allowing efficient multiprocessing, this reduce memory requirement is the other advantage of using a paged sieve implementation.
The above code takes about 59 milliseconds to find the primes to two million (slightly slower than some of the other simpler codes due to initialization overhead), but calculates the primes to one billion and the full number range in 1.55 and 5.95 seconds, respectively. This isn't much faster than the last version due to the DotNet extra overhead of an extra array bound check in the enumeration of found primes compared to the time expended in culling composite numbers which is less than a third of the time spent emumerating, so the saving in culling composites is cancelled out by the extra time (due to an extra array bounds check per prime candidate) in the enumeration. However, for many tasks involving primes, one does not need to enumerate all primes but can just compute the answers without enumeration.
For the above reasons, this class provides the example static methods "CountTo", "SumTo", and "ElementAt" to count or sum the primes to a given upper limit or to output the zero-based nth prime, respectively. The "CountTo" method will produce the number of primes to one billion and in the 32-bit number range in about 0.32 and 1.29 seconds, respectively; the "ElementAt" method will produce the last element in these ranges in about 0.32 and 1.25 seconds, respectively, and the "SumTo" method produces the sum of all the primes in these ranges in about 0.49 and 1.98 seconds respectively. This program calculates the sum of all the prime numbers to four billion plus as here in less time than many naive implementations can sum all the prime numbers to two million as in Euler Problem 10, for over 2000 times the practical range!
This code is only about four times slower than very highly optimized C code used by primesieve, and the reasons it is slower are mostly due to DotNet, as follows (discussing the case of a 256 Kilobyte buffer, which is the size of the L2 cache):
This DotNet code will count (CountTo) the number of primes to ten to the power thirteen (ten trillion) in about an hour and a half (tested) and the number of primes to a hundred trillion (ten to the fourteenth) in just over a half day (estimated) as compared to twenty minutes and under four hours for primesieve, respectively. This is interesting historically as until 1985 only the count of primes in the range to ten to the thirteenth was known since it would take too much execution time on the (expensive) supercomputers of that day to find the range ten times larger; now we can easily compute the number of primes in those ranges on a common desktop computer (in this case, an Intel i7-2700K - 3.5 GHz)!
Using this code, it is easy to understand why Professor Atkin and Bernstein thought that the SoA was faster than the SoE - a myth that persists to this day, with the reasoning as follows:
EDIT_ADD: Interestingly, this code runs 30% faster in x86 32-bit mode than in x64 64-bit mode, likely due to avoiding the slight extra overhead of extending the uint32 numbers to ulong's. All of the above timings are for 64-bit mode. END_EDIT_ADD
In summary: The segment paged Sieve of Atkin is actually slower than a maximally optimized segment paged Sieve of Eratosthenes with no saving in memory requirements!!!
I say again: "Why use the Sieve of Atkin?".