Pandas .fillna() not working with .sample()

116 views Asked by At

I have a data set with a column state whose unique values consist of ['released', 'isolated', 'deceased', nan]. I've tried to impute missing data using random sampling, like so:

for column in ['sex','state','city']:
    df[column].fillna(df[column].sample(), inplace=True)

The sex column appears to have properly imputed; there is no more missing sex data. The state column, however, does not appear to impute. When I examine the column, I receive the following:

In [1]: df['state'].sample()
Out[1]: 1391    released
Name: state, dtype: object

So the column is appropriately named in the imputation loop above. When I attempt the same on a raw dataframe, I receive a similar series of NaNs:

In [2]: new=pd.DataFrame({'blank':[np.nan for i in range(0,100)]})
In [3]: new['blank'].fillna(df['state'].sample())
Out[3]: 
0    NaN
1    NaN
2    NaN
3    NaN
4    NaN
      ..
95   NaN
96   NaN
97   NaN
98   NaN
99   NaN
Name: blank, Length: 100, dtype: float64

Why is the state column not properly sampling for the fillna()?

1

There are 1 answers

0
BENY On BEST ANSWER

You can not fillna with Series since it will match the index

new=pd.DataFrame({'blank':[np.nan for i in range(0,100)]})

new['blank'].fillna(df['state'].sample().iloc[0])