Segmented Sieve of Atkin, possible?

1.1k views Asked by At

I am aware of the fact that the Sieve of Eratosthenes can be implemented so that it finds primes continuosly without an upper bound (the segmented sieve).

My question is, could the Sieve of Atkin/Bernstein be implemented in the same way?

Related question: C#: How to make Sieve of Atkin incremental

However the related question has only 1 answer, which says "It's impossible for all sieves", which is obviously incorrect.

2

There are 2 answers

2
user448810 On BEST ANSWER

Atkin/Bernstein give a segmented version in Section 5 of their original paper. Presumably Bernstein's primegen program uses that method.

0
GordonBGood On

In fact, one can implement an unbounded Sieve of Atkin (SoA) not using segmentation at all as I have done here in F#. Note that this is a pure functional version that doesn't even use arrays to combine the solutions of the quadratic equations and the squaresfree filter and thus is considerably slower than a more imperative approach.

Berstein's optimizations using look up tables for optimum 32-bit ranges would make the code extremely complex and not suitable for presentation here, but it would be quite easy to adapt my F# code so that the sequences start at a set lower limit and are used only over a range in order to implement a segmented version, and/or applying the same techniques to a more imperative approach using arrays.

Note that even Berstein's implementation of the SoA isn't really faster than the Sieve of Eratosthenes with all possible optimizations as per Kim Walisch's primesieve but is only faster than an equivalently optimized version of the Sieve of Eratosthenes for the selected range of numbers as per his implementation.

EDIT_ADD: For those who do not want to wade through Berstein's pseudo-code and C code, I am adding to this answer to add a pseudo-code method to use the SoA over a range from LOW to HIGH where the delta from LOW to HIGH + 1 might be constrained to an even modulo 60 in order to use the modulo (and potential bit packing to only the entries on the 2,3,5 wheel) optimizations.

This is based on a possible implementation using the SoA quadratics of (4*x^2 + y^), (3*x^2 + y^2), and (3*x^2 -y^2) to be expressed as sequences of numbers with the x value for each sequence fixed to values between one and SQRT((HIGH - 1) / 4), SQRT((HIGH - 1) / 3), and solving the quadratic for 2*x^2 + 2*x - HIGH - 1 = 0 for x = (SQRT(1 + 2 * (HIGH + 1)) - 1) / 2, respectively, with the sequences expressed in my F# code as per the top link. Optimizations to the sequences there use that when sieving for only odd composites, for the "4x" sequences, the y values need only be odd and that the "3x" sequences need only use odd values of y when x is even and vice versa. Further optimization reduce the number of solutions to the quadratic equations (= elements in the sequences) by observing that the modulo patterns over the above sequences repeat over very small ranges of x and also repeat over ranges of y of only 30, which is used in the Berstein code but not (yet) implemented in my F# code.

I also do not include the well known optimizations that could be applied to the prime "squares free" culling to use wheel factorization and the calculations for the starting segment address as I use in my implementations of a segmented SoE.

So for purposes of calculating the sequence starting segment addresses for the "4x", "3x+", and "3x-" (or with "3x+" and "3x-" combined as I do in the F# code), and having calculated the ranges of x for each as per the above, the pseudo-code is as follows:

  1. Calculate the range LOW - FIRST_ELEMENT, where FIRST_ELEMENT is with the lowest applicable value of y for each given value of x or y = x - 1 for the case of the "3x-" sequence.

  2. For the job of calculating how many elements are in this range, this boils down to the question of how many of (y1)^2 + (y2)^2 + (y3)^2... there are where each y number is separated by two, to produce even or odd 'y's as required. As usual in square sequence analysis, we observe that differences between squares have a constant increasing increment as in delta(9 - 1) is 8, delta(25 - 9) is 16 for an increase of 8, delta (49 - 25) is 24 for a further increase of 8, etcetera. so that for n elements the last increment is 8 * n for this example. Expressing the sequence of elements using this, we get it is one (or whatever one chooses as the first element) plus eight times the sequence of something like (1 + 2 + 3 + ...+ n). Now standard reduction of linear sequences applies where this sum is (n + 1) * n / 2 or n^2/2 + n/2. This we can solve for how many n elements there are in the range by solving the quadratic equation n^2/2 + n/2 - range = 0 or n = (SQRT(8*range + 1) - 1) / 2.

  3. Now, if FIRST_ELEMENT + 4 * (n + 1) * n does not equal LOW as the starting address, add one to n and use FIRST_ELEMENT + 4 * (n + 2) * (n + 1) as the starting address. If one uses further optimizations to apply wheel factorization culling to the sequence pattern, look up table arrays can be used to look up the closest value of used n that satisfies the conditions.

  4. The modulus 12 or 60 of the starting element can be calculated directly or can be produced by use of look up tables based on the repeating nature of the modulo sequences.

  5. Each sequence is then used to toggle the composite states up to the HIGH limit. If the additional logic is added to the sequences to jump values between only the applicable elements per sequence, no further use of modulo conditions is necessary.

  6. The above is done for every "4x" sequence followed by the "3x+" and "3x-" sequences (or combine "3x+" and "3x-" into just one set of "3x" sequences) up to the x limits as calculated earlier or as tested per loop.

And there you have it: given an appropriate method of dividing the sieve range into segments, which is best used as fixed sizes that are related to the CPU cache sizes for best memory access efficiency, a method of segmenting the SoA just as used by Bernstein but somewhat simpler in expression as it mentions but does not combine the modulo operations and bit packing.