ThreadPoolExecutor too fast for CPU bound task

37 views Asked by At

I'm trying to understand how ThreadPoolExecutor and ProcessPoolExecutors work. My assumption for this test was that CPU-bound tasks such as increasing a counter wouldn't benefit from running on ThreadPoolExecutors because it doesn't release the GIL, so it can only use one process at a time.

@measure_execution_time
def cpu_slow_function(item):
    start = time.time()
    duration = random()
    counter = 1
    while time.time() - start < duration:
        counter += 1
    return item, counter


def test_thread_pool__cpu_bound():
    """
    100 tasks of average .5 seconds each, would take 50 seconds to complete sequentially.
    """

    items = list(range(100))

    with ThreadPoolExecutor(max_workers=100) as executor:
        results = list(executor.map(cpu_slow_function, items))

    for index, (result, counter) in enumerate(results):
        assert result == index
        assert counter >= 0.0

To my surprise, this test takes about ~5s to finish. Based on my assumptions, it should be taking ~50s, 100 tasks of an average of 0.5s each.

What am I missing?

2

There are 2 answers

0
Solomon Slow On BEST ANSWER

The GIL does not prevent Python threads from running concurrently. It only prevents more than one thread from executing a byte code at any given moment in time.

At any given moment, your program will have one worker that actually is in the middle of executing a byte code, and 99 workers that all are awaiting their chance to execute their next byte code, and in the mean time, the time.time() clock is ticking (i.e., real time is passing) for all of them.

0
SIGHUP On

You can see the effect of GIL in the following variation of your code:

from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
from time import perf_counter, time
from os import cpu_count

def func(item: int) -> tuple[int, ...]:
    start = time()
    counter = 1
    while time() - start < 0.5:
        counter += 1
    return item, counter

def main():
    n = max(2, cpu_count() or 2) - 1
    for executor in ProcessPoolExecutor, ThreadPoolExecutor:
        with executor(n) as exe:
            begin = perf_counter()
            for _ in exe.map(func, range(100)):
                pass
            duration = perf_counter() - begin
            print(executor.__name__, f"{duration:.4f}s")

if __name__ == "__main__":
    main()

For better statistical analysis you should use a constant rather than a variable (pseudo-random) duration. Here we use 0.5s

Also, you need to ensure that both the process and thread pools are of the same size. That size is based on the number of CPUs minus 1.

Output:

ProcessPoolExecutor 7.5492s
ThreadPoolExecutor 7.8219s

We see that multi-threading is a litter slower than multiprocessing in this case because func() is entirely CPU bound.

Note:

Tested on Apple Silicon M2 where os.cpu_count() == 8