python multi-processing zombie processes

2.6k views Asked by At

I have a simple implementation of python's multi-processing module

if __name__ == '__main__':
jobs = []

while True:
    for i in range(40):
        # fetch one by one from redis queue
        #item = item from redis queue
        p = Process(name='worker '+str(i), target=worker, args=(item,))

        # if p is not running, start p
        if not p.is_alive():
            jobs.append(p)
            p.start()

    for j in jobs:
        j.join()
        jobs.remove(j)


def worker(url_data):
    """worker function"""
    print url_data['link']

What I expect this code to do:

  1. run in infinite loop, keep waiting for Redis queue.
  2. if Redis queue not empty, fetch item.
  3. create 40 multiprocess.Process, not more not less
  4. if a process has finished processing, start new process, so that ~40 process are running at all time.

I read that, to avoid zombie process that should be bound(join) to the parent, that's what I expected to achieve in the second loop. But the issue is that on launching it spawns 40 processes, workers finish processing and enter zombie state, until all currently spawned processes haven't finished, then in next iteration of "while True", the same pattern continues.

So my question is: How can I avoid zombie processes. and spawn new process as soon as 1 in 40 has finished

1

There are 1 answers

2
Paolo Casciello On BEST ANSWER

For a task like the one you described is usually better to use a different approach using Pool.

You can have the main process fetching data and the workers deal with it.

Following an example of Pool from Python Docs

def f(x):
    return x*x

if __name__ == '__main__':
    pool = Pool(processes=4)              # start 4 worker processes
    result = pool.apply_async(f, [10])    # evaluate "f(10)" asynchronously
    print result.get(timeout=1)           # prints "100" unless your computer is *very* slow
    print pool.map(f, range(10))          # prints "[0, 1, 4,..., 81]"

I also suggest to use imap instead of map as it seems your task can be asynch.

Roughly your code will be:

p = Pool(40)

while True:
  items = items from redis queue
  p.imap_unordered(worker, items) #unordered version is faster


def worker(url_data):
  """worker function"""
  print url_data['link']