How to improve memory intensive Python scripts?

387 views Asked by At

Today, I wrote a short script for a prime sieve, and I am looking to improve on it. I am rather new to python and programming in general, and so I am wondering: what is a good way to reduce memory usage in a program where large lists of numbers are involved? Here is my example script:

def ES(n):
    A = list(range(2, n+1))
    for i in range(2, n+1):
        for k in range(2, (n+i)//i):
            A[i*k-2] = str(i*k)
    A = [x for x in A if isinstance(x, int)]
    return A

This script turns all composites in the list, A, into strings and then returns the list of remaining integers, which are all prime, yet it runs the A[i*k-2] = str(i*k) three times for the number 12, as it goes through all multiples of 2, then 3 and again for 6. With something like that happening, while storing such a large list, I hit a brick wall rather soon and it crashes. Any advice would be greatly appreciated! Thanks in advance.

EDIT: I don't know if this makes a difference, but I'm using Python 3.3

2

There are 2 answers

2
user2357112 On BEST ANSWER

First, you're using a really weird, inefficient way to record whether something is composite. You don't need to store string representations of the numbers, or even the numbers themselves. You can just use a big list of booleans, where prime[n] is true if n is prime.

Second, there's no reason to worry about wasting a bit of space at the beginning of the list if it simplifies indexing. It's tiny compared to the space taken by the rest of the list, let alone all the strings and ints and stuff you were using. It's like saving $3 on paint for your $300,000 car.

Third, range takes a step parameter you can use to simplify your loops.

def sieve(n):
    """Returns a list of primes less than n."""

    # n-long list, all entries initially True
    # prime[n] is True if we haven't found a factor of n yet.
    prime = [True]*n

    # 0 and 1 aren't prime
    prime[0], prime[1] = False, False

    for i in range(2, n):
        if prime[i]:
            # Loop from i*2 to n, in increments of i.
            # In other words, go through the multiples of i.
            for j in range(i*2, n, i):
                prime[j] = False

    return [x for x, x_is_prime in enumerate(prime) if x_is_prime]
1
jh314 On

Rather than using large lists, you can use generators.

def ESgen():
    d = {}  
    q = 2   
    while 1:
        if q not in d:
            yield q        
            d[q*q] = [q]   
        else:
            for p in d[q]: 
                d.setdefault(p+q,[]).append(p)
            del d[q]      
        q += 1

To get the list of first n primes using the generator, you can do this:

from itertools import islice

gen = ESgen()
list(islice(gen, n))  

If you only want to check if a number is prime, set is faster than list:

from itertools import islice

gen = ESgen()
set(islice(gen, n))