Complexity of converting a set to a frozenset in Python

3.7k views Asked by At

What is the computational complexity of "freezing" a set in Python?

For example, does the second line in

a = {1,2,3}
b = frozenset(a)

require O(n) time? Or is it simply a "view" created in constant time?

2

There are 2 answers

0
jpp On BEST ANSWER

b is not a view of a. You can check this via id:

a = {1, 2, 3}
b = a

id(a) == id(b)  # True

b = frozenset({1, 2, 3})

id(a) == id(b)  # False

Hence a change in b will not be reflected in a. You can, of course, test this yourself. Since frozenset takes an iterable as an argument, it must iterate over each argument. This is an O(n) process.

As an aside, there's nothing special about frozenset, even creating a set from a set has O(n) time complexity:

for i in [10**5, 10**6, 10**7]:
    a = set(range(i))
    %timeit set(a)

100 loops, best of 3: 3.33 ms per loop
10 loops, best of 3: 30.2 ms per loop
1 loop, best of 3: 421 ms per loop   
4
Chris Bouchard On

While the complexity doesn't change, the runtime is different depending whether the input to the frozenset initializer is an iterator, list, tuple, or set.

I ran the following shell script in ZSH:

for i in 5 6 7
do
    for f in '' 'list' 'tuple' 'set'
    do
        echo "i=$i f=$f"
        python -m timeit -s "x=$f(range(10**$i))" 'frozenset(x)'
    done
done

Which produced the following results on my laptop using Python 3.11.2:

i=5 f=
100 loops, best of 5: 2.37 msec per loop
i=5 f=list
200 loops, best of 5: 1.31 msec per loop
i=5 f=tuple
200 loops, best of 5: 1.41 msec per loop
i=5 f=set
500 loops, best of 5: 609 usec per loop
i=6 f=
10 loops, best of 5: 25.6 msec per loop
i=6 f=list
20 loops, best of 5: 14.4 msec per loop
i=6 f=tuple
20 loops, best of 5: 14.5 msec per loop
i=6 f=set
50 loops, best of 5: 5.97 msec per loop
i=7 f=
1 loop, best of 5: 256 msec per loop
i=7 f=list
2 loops, best of 5: 144 msec per loop
i=7 f=tuple
2 loops, best of 5: 148 msec per loop
i=7 f=set
5 loops, best of 5: 76.4 msec per loop

It was consistently fastest to initialize the frozenset from an input set; initializing from a list and tuple were about tied; and initializing directly from the range iterator was slowest (probably because it was including the iteration, which happened during setup for the others).

This indicates to me that frozenset might be able to reuse more of the state of a set—even if it has to make a copy to remain immutable. For other iterable types, frozenset may have to build its state from scratch.


One of the comments suggested that it might just be that a set's iterator is faster than the iterator for a tuple or list. Here's the same script, but this time simply iterating rather than building a frozenset:

for i in 5 6 7
do
    for f in '' 'list' 'tuple' 'set'
    do
        echo "i=$i f=$f"
        python -m timeit -s "x=$f(range(10**$i))" 'for _ in x: pass'
    done
done

And here are the results:

i=5 f=
200 loops, best of 5: 1.52 msec per loop
i=5 f=list
500 loops, best of 5: 419 usec per loop
i=5 f=tuple
1000 loops, best of 5: 359 usec per loop
i=5 f=set
500 loops, best of 5: 688 usec per loop
i=6 f=
20 loops, best of 5: 15 msec per loop
i=6 f=list
50 loops, best of 5: 4.34 msec per loop
i=6 f=tuple
100 loops, best of 5: 3.64 msec per loop
i=6 f=set
50 loops, best of 5: 6.3 msec per loop
i=7 f=
2 loops, best of 5: 147 msec per loop
i=7 f=list
5 loops, best of 5: 43.5 msec per loop
i=7 f=tuple
5 loops, best of 5: 38.1 msec per loop
i=7 f=set
5 loops, best of 5: 58.4 msec per loop

(Note: This was run on a different machine than the previous result, so the absolute numbers are probably not comparable. I did test to make sure the previous result still holds. All we care about in the following are the relative results.)

Simply iterating over a set was significantly slower than iterating over a list or a tuple. That makes complete sense to me—lists and tuples are backed by arrays, so you really can't do much better for iterating.

Even if sets are also stored as arrays, they'll be sparse due to hashing. Python presumably either keeps a separate internal list for iteration, or skips the empty slots.

Yet, despite being slower to iterate, it's still faster to initialize a frozenset from a set.