I am trying to run two separate Python in parallel. To do this I am using the ProcessPoolExecutor()
from concurrent.futures
module. I have made two test scripts which I want to run in parallel:
Script 1:
import time
def run():
print('starting script 1')
t = 3
print(f'Running script 1: Sleeping {t} second(s)')
time.sleep(t)
return None
if __name__ == '__main__':
run()
Script 2:
import time
def run():
print('starting script 2')
t = 4
print(f'Sleeping {t} second(s)')
time.sleep(t)
return None
if __name__ == '__main__':
run()
These scripts simply sleep for 3 and 4 seconds, respectively. To call these scripts I use the following code:
import time
import concurrent.futures
import test_script_1, test_script_2
if __name__ == '__main__':
start = time.perf_counter()
print("Starting with scripts execution")
script_list = ['test_script_1', 'test_script_2']
with concurrent.futures.ProcessPoolExecutor() as executor:
for script in script_list:
executor.map(exec(f'{script}.run()'))
print('all scripts done')
finish = time.perf_counter()
print(f'Finished in {round(finish-start,2)} second(s)')
All scripts are in the same folder. I would expect these scripts to run in parallel, but they don't. The output I get in my console is the following:
Starting with scripts execution
starting script 1
Running script 1: Sleeping 3 second(s)
starting script 2
Sleeping 4 second(s)
all scripts done
Finished in 7.0 second(s)
As you can see, the script takes 7 seconds to run in total, rather than the expected 4 seconds. What am I doing wrong?
I tried switching to ThreadPoolExecutor(), which does the exact same thing.
Update:
I have implemented the code provided by OM222O:
import time
from concurrent.futures import ProcessPoolExecutor as PPE
from os import cpu_count
import test_script_1, test_script_2
if __name__ == '__main__':
start = time.perf_counter()
print("Starting with scripts execution")
print("Available CPU cores: ", cpu_count())
script_list = [test_script_1.run, test_script_2.run]
with PPE() as executor:
futures = [executor.submit(script) for script in script_list]
# wait for scripts to finish execution
# for future in futures:
# future.result()
print('all scripts done')
finish = time.perf_counter()
print(f'Finished in {(finish - start):.2f} seconds')
This seems to be working. The total execution time is 4 seconds. However, the console is not printing the print()
statements in the individual scripts:
Starting with scripts execution
Available CPU cores: 40
all scripts done
Finished in 4.40 seconds
If I switch to the ThreadPoolExecutor()
the script outputs are printed:
Starting with scripts execution
Available CPU cores: 40
starting script 1
Running script 1: Sleeping 3 second(s)
starting script 2
Sleeping 4 second(s)
all scripts done
Finished in 4.00 seconds
However, since the actual scripts I intend to run in parallel are CPU heavy processes and not I/O processes I would like to use ProcessPoolExecutor()
and not ThreadPoolExecutor()
I am using Spyder 5.4.2, see below.
To be honest, I'm not sure why your code is running for 7 seconds, but you are using
map
incorrectly; the whole point ofmap
is to not dofor x in y
, which is exactly what you're doing. I think the way you wrote your code means that it's executing all the scripts (which takes 7 seconds) for every item in the list. Also check that you actually have multiple cores available, or try passingos.cpu_count()
toPPE
, i.e.:PPE(os.cpu_count())
.This works for me (I only included main.py since the other modules are easy to copy from the image):
main.py:
Here are the modules and the results: