I am having trouble with mulitprocessing's pool. In the past, for something very similar to the code setup as follows, pool.map() iterated through out the list (I could be wrong and perhaps had something else call the remaining items in the list?) but that does not seem to be the case here. Here the code works but only for the first 16 items, 16 being the number of cores I have on my machine.

What is the expected behavior for code that is setup as follows:

def export_task(item):
    subject, outputPathChunk = item
    subject.export_hdf5(outputPathChunk)

And then

import multiprocessing

pool = multiprocessing.Pool(processes=multiprocessing.cpu_count())

pool.map(export_task,subs)

pool.close()

Where subs is a 600 items list of tuples and each tuple is a vaex table (a pandas alternative for larger data) and a path.

There is a vaex related warning for the first 16 executions for export_task and I am wondering if that is choking pool.map. That would be a simple issue to work around but doing a simple sample_table.export_hdf5(sample_path) sanity check does not produce the same warning.

Is the pool stalling because the function does not return an output and only does file I/O? or is this because of the vaex warning that is only produced within the pool?

0

There are 0 answers