I was trying to add a list of numpy arrays as elements to the pandas DataFrame:
using:
df.loc[df['B']==4,'A'] = [np.array([5, 6, 7, 8]),np.array([2,3])]
Whether or not this is allowed seems to depend on how I initialise df:
Testing two different initialisations of df
Can someone explain to me what's going on?
Here's the code as text for everyone to try:
The code that's not working
df = pd.DataFrame(columns=['A','B'])
a = [1,2,0,4,5]
b = [3,4,4,7,3]
df['A'] = a
df['B'] = b
df.loc[df['B']==4,'A'] = [np.array([5, 6, 7, 8]),np.array([2,3])]
df
The code that's working
df = pd.DataFrame(columns=['A','B'])
a = [1,2,0,4,5]
b = [3,4,4,7,3]
for i in range(len(a)):
df.loc[i,'A'] = a[i]
df.loc[i,'B'] = b[i]
df.loc[df['B']==4,'A'] = [np.array([5, 6, 7, 8]),np.array([2,3])]
df
You have to create a
Serieswith the correct index:Note that in a future version of pandas this might trigger an error since the original dtype for
Ais integer. You would first need to convert to object:Output:
why does the second approach work?
Not sure, most likely due to a peculiar internal state of the DataFrame (I suspect because it's initialized solely from a loop and an empty object DataFrame), but this is most likely not supposed to work and is very unstable.
For instance this would fail if you add another column (even object):
But creating the DataFrame from a single block object numpy array works: