Pool map stuck in infinite loop when newly created paths are passed through

37 views Asked by At

I want to create multiple folders from a Base folder and then run those multiple folders parallelly. It is working fine when the folders are already created. However, it is stuck in infinite loop when I create those folders and then run the multi processing in a single python code.

Working Code (When adress_a and address_b are already existing:

import os
from multiprocessing import Pool

Base_address='C:\\Users\\bappi\\Desktop\\Base_address'
address_a='C:\\Users\\bappi\\Desktop\\address_a'
address_b='C:\\Users\\bappi\\Desktop\\address_b'


Folders=['']*2

Folders[0]=address_a
Folders[1]=address_b


def call_exe(address):
    os.chdir(address)
    exe_file='Test.exe'
    os.system(exe_file)

if __name__ == '__main__':
    with Pool(2) as p:
        p.map(call_exe,Folders)

However, when address_a and address_b non existing and I want to create it from base folder, then the following code is not working.

Not Working code: when i create a copy of adrress_a and address_b from base folder

import os
from multiprocessing import Pool
import shutil

Base_address='C:\\Users\\bappi\\Desktop\\Base_address'
address_a='C:\\Users\\bappi\\Desktop\\address_a'
address_b='C:\\Users\\bappi\\Desktop\\address_b'

shutil.copytree(Base_address, address_a)     # successfully folder created
shutil.copytree(Base_address, address_b)     # successfully folder created

Folders=['']*2

Folders[0]=address_a
Folders[1]=address_b


def call_exe(address):
    os.chdir(address)
    exe_file='Test.exe'
    os.system(exe_file)

if __name__ == '__main__':
    with Pool(2) as p:
        p.map(call_exe,Folders)   # This one stuck in infinite loop

1

There are 1 answers

0
Booboo On

If you hold your cursor over the tag multiprocessing you will see:

enter image description here

When questions are posted with the tag multiprocessing it is important to always specify the platform you are executing on as it can make a huge difference between your code working or not.

Since you did not specify the platform I will, perhaps erroneously, assume that it is one that uses the spawn method rather than the fork method to create new processes. Even if that is not the case you may find some value in this answer for the future.

When the spawn method is used to create processes and you are using multiprocessing (a multiprocessing pool in this case), the child processes are created and initially have uninitialized memory. Then for each child process the Python interpreter is loaded and executed reading in the original source program. Therefore, every statement at global scope (import statements, function definitions, etc.) will be executed to initialize memory before your worker function call_exe is called.

So if you are creating a pool with N processes, the statements at global scope will be executed N times, once for each process. Some of these statements may not be required for the proper initialization required by the worker function, for example an import statement for a module/package not used by the worker function, but it may not cause undue harm if executed. But other unnecessary global statements might cause irreparable harm or add inefficiencies because they are wasteful of CPU and/or memory resource. Therefore, if you have anything at global scope that you do not want executed you must enclose such statements within the check if __name__ == '__main__':, which will only evaluate as True for the initial, main process. You have at global scope the statements (among others):

shutil.copytree(Base_address, address_a)     # successfully folder created
shutil.copytree(Base_address, address_b)     # successfully folder created

Each of these statements will be executed once for each process in your multiprocessing pool for a total of 2 times each since you have a pool of size 2. The initialization for the second pool process will by necessity create an exception since the directories will have already existed. If you are running Python 3.8 or greater you can specify the dir_exists_ok=True argument to allow for the directory already existsing. But even then while the first pool process is processing the directory in worker function call_exe the second pool process might be modifying the directory with its call to copytree. These are statements at global scope causing irreparable harm.

In the code below I have moved all statements that are not required by call_exe to work to the if __name__ == '__main__': block (some of these, such as the import statements or definitions for the folders could have been left outside the name check as in your original code and would cause no harm except waste a few CPU cycles in initializing your pool processes).

I have also tried to follow the PEP 8 – Style Guide for Python Code in naming of variables, spacing, etc. I also found your use of the word "address" to refer to a folder/directory name to be a bit unusuall (path, directory, directory_name, folder or folder_name would have all been more meaningful to someone reading the code).

import os

def call_exe(folder):
    os.chdir(folder)
    exe_file = 'Test.exe'
    os.system(exe_file)


if __name__ == '__main__':
    from multiprocessing import Pool
    import shutil

    # A Much simpler way to initialize the folders list
    # and to initialize the directories:

    base_folder = 'C:\\Users\\bappi\\Desktop\\Base_address'

    folders = [
        'C:\\Users\\bappi\\Desktop\\address_a',
        'C:\\Users\\bappi\\Desktop\\address_b'
    ]

    for folder in folders:
        shutil.copytree(base_folder, folder, dirs_exist_ok=True)  # successfully folder created

    with Pool(2) as p:
        p.map(call_exe, folders)