Linked Questions

Popular Questions

Caching a data frame in joblib

Asked by At

Joblib has functionality for sharing Numpy arrays across processes by automatically memmapping the array. However this makes use of Numpy specific facilities. Pandas does use Numpy under the hood, but unless your columns all have the same data type, you can't really serialize a DataFrame to a single Numpy array.

What would be the "right" way to cache a DataFrame for reuse in Joblib?

My best guess would be to memmap each column separately, then reconstruct the dataframe inside the loop (and pray that Pandas doesn't copy the data). But that seems like a pretty intensive process.

I am aware of the standalone Memory class, but it's not clear if that can help.

Related Questions