Concatting thousands of dataframes in last dask step triggers memory error

124 views Asked by At

My dask script runs well until the last step, which concats thousands of dataframes together and writes to CSV. Memory use immediately jumps from 6GB to over 15GB and I receive an error like "95% memory exceeded, restarting workers". My machine has plenty of memory though. I have two questions: (1) how can I increase the available memory for workers or for this last step? (2) Would intermediate concat steps help and how best to add them? The problematic code is below is:

future = client.submit(pd.concat, tasks)
future.result().to_csv(path)
0

There are 0 answers