I have a simple implementation of python's multi-processing module
if __name__ == '__main__':
jobs = []
while True:
for i in range(40):
# fetch one by one from redis queue
#item = item from redis queue
p = Process(name='worker '+str(i), target=worker, args=(item,))
# if p is not running, start p
if not p.is_alive():
jobs.append(p)
p.start()
for j in jobs:
j.join()
jobs.remove(j)
def worker(url_data):
"""worker function"""
print url_data['link']
What I expect this code to do:
- run in infinite loop, keep waiting for Redis queue.
- if Redis queue not empty, fetch item.
- create 40 multiprocess.Process, not more not less
- if a process has finished processing, start new process, so that ~40 process are running at all time.
I read that, to avoid zombie process that should be bound(join) to the parent, that's what I expected to achieve in the second loop. But the issue is that on launching it spawns 40 processes, workers finish processing and enter zombie state, until all currently spawned processes haven't finished, then in next iteration of "while True", the same pattern continues.
So my question is: How can I avoid zombie processes. and spawn new process as soon as 1 in 40 has finished
For a task like the one you described is usually better to use a different approach using
Pool
.You can have the main process fetching data and the workers deal with it.
Following an example of
Pool
from Python DocsI also suggest to use
imap
instead ofmap
as it seems your task can be asynch.Roughly your code will be: