How to use multiprocessing in Python with big data?

39 views Asked by At

I wanna make anapp, but it has to calculate a lot of things, so I want to use multiprocessing to speedup but can't make it work. I did and somewhat worked is copying all the data for all processes, but it took too much memory...

What I did:

from concurrent.futures import ProcessPoolExecutor
import pandas as pd
import time

variables = {} # tried to save the values for accessing later

def test(n):
    distances = variables['distances']
    position = variables['position']
    time.sleep(350)
    return position

def main():
    variables['distances'] = pd.read_csv('very_large_file.csv') # let 1 for this exemple # imagine a file with a lot of data, like 2Gb
    variables['position'] = pd.read_csv('another_very_large_file.csv') # let 2 # another file with a lot of data, like 2Gb

    with ProcessPoolExecutor(5) as executor:
        for result in executor.map(test, range(5)):
            print(result)

if __name__ == '__main__':
    main()

This code returns a error that 'a' is not in variables.

If I load the file using a 'file.csv' it takes way too much ram, but my pc can't run it, so I want to know how it can access a that object, instead of creating a new one.

0

There are 0 answers