python multi-processing zombie processes

Question

python multi-processing zombie processes

2.7k views Asked by Sachin Upmanyu At 15 June 2015 at 12:54

I have a simple implementation of python's multi-processing module

if __name__ == '__main__':
jobs = []

while True:
    for i in range(40):
        # fetch one by one from redis queue
        #item = item from redis queue
        p = Process(name='worker '+str(i), target=worker, args=(item,))

        # if p is not running, start p
        if not p.is_alive():
            jobs.append(p)
            p.start()

    for j in jobs:
        j.join()
        jobs.remove(j)


def worker(url_data):
    """worker function"""
    print url_data['link']

What I expect this code to do:

run in infinite loop, keep waiting for Redis queue.
if Redis queue not empty, fetch item.
create 40 multiprocess.Process, not more not less
if a process has finished processing, start new process, so that ~40 process are running at all time.

I read that, to avoid zombie process that should be bound(join) to the parent, that's what I expected to achieve in the second loop. But the issue is that on launching it spawns 40 processes, workers finish processing and enter zombie state, until all currently spawned processes haven't finished, then in next iteration of "while True", the same pattern continues.

So my question is: How can I avoid zombie processes. and spawn new process as soon as 1 in 40 has finished

Original Q&A

There are 1 answers

**Paolo Casciello** · Accepted Answer · 2015-06-15T13:03:11+00:00

For a task like the one you described is usually better to use a different approach using Pool.

You can have the main process fetching data and the workers deal with it.

Following an example of Pool from Python Docs

def f(x):
    return x*x

if __name__ == '__main__':
    pool = Pool(processes=4)              # start 4 worker processes
    result = pool.apply_async(f, [10])    # evaluate "f(10)" asynchronously
    print result.get(timeout=1)           # prints "100" unless your computer is *very* slow
    print pool.map(f, range(10))          # prints "[0, 1, 4,..., 81]"

I also suggest to use imap instead of map as it seems your task can be asynch.

Roughly your code will be:

p = Pool(40)

while True:
  items = items from redis queue
  p.imap_unordered(worker, items) #unordered version is faster


def worker(url_data):
  """worker function"""
  print url_data['link']

TechQA.

python multi-processing zombie processes

There are 1 answers

Related Questions in PYTHON

Related Questions in MULTIPROCESSING

Related Questions in ZOMBIE-PROCESS

Popular Questions

Popular Tags

Trending Questions