python multiprocessing script does not exit

2.7k views Asked by At

I am trying to get a little bit more comfortabl with the python2.7 multiprocessing module. So I have written a small script that takes filenames and the desired number of processes as input, and then starts multiple processes to apply a function to each file name in my queue. It looks like this:

import multiprocessing, argparse, sys
from argparse import RawTextHelpFormatter

def parse_arguments():
    descr='%r\n\nTest different functions of multiprocessing module\n%r' % ('_'*80, '_'*80)
    parser=argparse.ArgumentParser(description=descr.replace("'", ""), formatter_class=RawTextHelpFormatter)
    parser.add_argument('-f', '--files', help='list of filenames', required=True, nargs='+')
    parser.add_argument('-p', '--processes', help='number of processes for script', default=1, type=int)
    args=parser.parse_args()
    return args 

def print_names(name):
    print name


###MAIN###

if __name__=='__main__':
    args=parse_arguments()
    q=multiprocessing.Queue()
    procs=args.processes
    proc_num=0
    for name in args.files:
        q.put(name)
    while q.qsize()!=0:
        for x in xrange(procs):
            proc_num+=1
            file_name=q.get()
            print 'Starting process %d' % proc_num
            p=multiprocessing.Process(target=print_names, args=(file_name,))
            p.start()
            p.join()
            print 'Process %d finished' % proc_num

The script works fine and starts a new process every time an old process finishes (I think that's how it works?), until all objects in the queue are used up. However, the script does not exit after completing the queue, but sits idle and I have to kill it using Ctrl+C. What is the problem here?

Thanks for your answers!

1

There are 1 answers

4
jbndlr On

Seems as if you've mixed a few things up there. You spawn a process, have it do its work, and wait for it to exit before starting a new process in the next iteration. Using this approach, you are stuck in sequential processing, there is no actual multiprocessing being performed here.

Maybe you want to take this as a starting point:

import sys
import os
import time
import multiprocessing as mp

def work_work(q):
    # Draw work from the queue
    item = q.get()
    while item:
        # Print own process id and the item drawn from the queue
        print(os.getpid(), item)
        # Sleep is only for demonstration here. Usually, you 
        # do not want to use this! In this case, it gives the processes
        # the chance to "work" in parallel, otherwise one process
        # would have finished the entire queue before a second one
        # could be spawned, because this work is quickly done.
        time.sleep(0.1)
        # Draw new work
        item = q.get()

if __name__=='__main__':
    nproc = 2  # Number of processes to be used
    procs = [] # List to keep track of all processes

    work = [chr(i + 65) for i in range(5)]
    q = mp.Queue() # Create a queue...
    for w in work:
        q.put(w) # ...and fill it with some work.

    for _ in range(nproc):
        # Spawn new processes and pass each of them a reference
        # to the queue where they can pull their work from.
        procs.append(mp.Process(target=work_work, args=(q,)))
        # Start the process just created.
        procs[-1].start()

    for p in procs:
        # Wait for all processes to finish their work. They only
        # exit once the queue is empty.
        p.join()