Demonstration of sets being faster than arrays?

Question

Demonstration of sets being faster than arrays?

45 views Asked by Katsu At 17 October 2022 at 21:12

The goal is to find the intersection of 2 arrays. I know you're supposed to use sets to get a faster runtime. However, in my notebook I was surprised to find that my runtime is not faster with the sets method. (89 ms (array) vs. 220 ms (sets)). Is there something wrong with my code or something I'm misunderstanding?

A = range(50,1000000)
B = range(-1000000,60)

%%time
# O(m*n)?
def intersection(A, B):
  array = []
  for i in A:
    if i in B:
      array.append(i)
  return array

intersection(A,B)

%%time
# O(1)?
def intersection(A, B):
  set_a = set(A)
  set_b = set(B)
  return [i for i in set_a if i in set_b]
  # return set_a.intersection(set_b)

intersection(A,B)

Original Q&A

There are 2 answers

Anthony DiGiovanna On 17 October 2022 at 21:22

I think the reason you're seeing this slowdown is because the set based intersection function is converting all of the data into a set, and this extra conversion step takes some time. When I do the conversion first, I get a faster time with the set than the list (23 ms vs 36 ms)

**Andrey Tyukin** · Accepted Answer · 2022-10-17T21:24:54+00:00

You are not comparing arrays to sets.

You are comparing ranges to sets.

Checking the membership in a range is almost instantaneous (two comparisons: one with the lower, one with the upper bound).

Checking the membership in an array would take millions of iterations in this case, which would be substantially longer than membership checks for sets.

If you convert the ranges to lists first, you see a huge difference:

import time

A = range(50,100000)
B = range(-100000,60)

def intersectionArrays(A, B):
  array = []
  arr_a = list(A)
  arr_b = list(B)
  for i in arr_a:
    if i in arr_b:
      array.append(i)
  return array


def intersectionSets(A, B):
  set_a = set(A)
  set_b = set(B)
  return [i for i in set_a if i in set_b]
  # return set_a.intersection(set_b)



startArr = time.process_time()
intersectionArrays(A,B)
endArr = time.process_time()

startSet = time.process_time()
intersectionSets(A,B)
endSet = time.process_time()

print(endArr - startArr)
print(endSet - startSet)

It prints something like

Arrays: 39.871636528
Sets:   0.008161553000000765

There is no point trying it with 1000000 - it would take hours.

Furthermore, in order to measure something vaguely realistic, you should shuffle the numbers, otherwise branch prediction might distort the results.

TechQA.

Demonstration of sets being faster than arrays?

There are 2 answers

Related Questions in PYTHON

Related Questions in ARRAYS

Related Questions in SET

Related Questions in SET-INTERSECTION

Popular Questions

Trending Questions