I have a data set with a column state
whose unique values consist of ['released', 'isolated', 'deceased', nan]
. I've tried to impute missing data using random sampling, like so:
for column in ['sex','state','city']:
df[column].fillna(df[column].sample(), inplace=True)
The sex
column appears to have properly imputed; there is no more missing sex
data. The state
column, however, does not appear to impute. When I examine the column, I receive the following:
In [1]: df['state'].sample()
Out[1]: 1391 released
Name: state, dtype: object
So the column is appropriately named in the imputation loop above. When I attempt the same on a raw dataframe, I receive a similar series of NaN
s:
In [2]: new=pd.DataFrame({'blank':[np.nan for i in range(0,100)]})
In [3]: new['blank'].fillna(df['state'].sample())
Out[3]:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
..
95 NaN
96 NaN
97 NaN
98 NaN
99 NaN
Name: blank, Length: 100, dtype: float64
Why is the state
column not properly sampling for the fillna()
?
You can not
fillna
withSeries
since it will match theindex