I'd creating a Pandas DataFrame in which each particular (index, column) location can be a numpy ndarray of arbitrary shape, or even a simple number.
This works:
import numpy as np, pandas as pd
x = pd.DataFrame([[np.random.rand(100, 100, 20, 2), 3], [2, 2], [3, 3], [4, 4]],
index=['A1', 'B2', 'C3', 'D4'], columns=['data', 'data2'])
print(x)
but takes 50 seconds to create on my computer! Why?
np.random.rand(100, 100, 20, 2) alone is super fast (< 1 second to create)
How to speed up the creation of Pandas datasets containing ndarrays of various shapes?
It's not actually the creation that is the issue, it's the
printstatement. 1000 loops of the creation take 2.8 seconds on my computer. But one iteration of theprinttakes about 26 seconds.Interestingly,
print(x['data2']),print(x['data']['A1'])andprint(x['data']['B2'])are all basically instantaneous. So it seemsprintis having an issue figuring out how to display items of vastly different size. Perhaps a bug?