I have implemented the Sieve of Atkin and it works great up to primes nearing 100,000,000 or so. Beyond that, it breaks down because of memory problems.
In the algorithm, I want to replace the memory based array with a hard drive based array. Python's "wb" file functions and Seek functions may do the trick. Before I go off inventing new wheels, can anyone offer advice? Two issues appear at the outset:
- Is there a way to "chunk" the Sieve of Atkin to work on segment in memory, and
- is there a way to suspend the activity and come back to it later - suggesting I could serialize the memory variables and restore them.
Why am I doing this? An old geezer looking for entertainment and to keep the noodle working.
First of all you should make sure that you store your data in an efficient manner. You could easily store the data for up to 100,000,000 primes in 12.5Mb of memory by using bitmap, by skipping obvious non-primes (even numbers and so on) you could make the representation even more compact. This also helps when storing the data on hard drive. You getting into trouble at 100,000,000 primes suggests that you're not storing the data efficiently.
Some hints if you don't receive a better answer.
Yes, for the Eratosthenes-like part what you could do is to run multiple elements in the sieve list in "parallell" (one block at a time) and that way minimize the disk accesses.
The first part is somewhat more tricky, what you would want to do is to process the
4*x**2+y**2,3*x**2+y**2and3*x**2-y**2in a more sorted order. One way is to first compute them and then sort the numbers, there are sorting algorithms that work well on drive storage (still being O(N log N)), but that would hurt the time complexity. A better way would be to iterate overxandyin such a way that you run on a block at a time, since a block is determined by an interval you could for example simply iterate over allxandysuch thatlo <= 4*x**2+y**2 <= hi.In order to achieve this (no matter how and when the program is terminated) you have to first have journalizing disk accesses (fx use a SQL database to keep the data, but with care you could do it yourself).
Second since the operations in the first part are not indempotent you have to make sure that you don't repeat those operations. However since you would be running that part block by block you could simply detect which was the last block processed and resume there (if you can end up with partially processed block you'd just discard that and redo that block). For the Erastothenes part it's indempotent so you could just run through all of it, but for increasing speed you could store a list of produced primes after the sieving of them has been done (so you would resume with sieving after the last produced prime).
As a by-product you should even be able to construct the program in a way that makes it possible to keep the data from the first step even when the second step is running and thereby at a later moment extending the limit by continuing the first step and then running the second step again. Perhaps even having two program where you terminate the first when you've got tired of it and then feeding it's output to the Eratosthenes part (thereby not having to define a limit).