How to fill nan values of each column in pandas with respect to the average of each class in that column

550 views Asked by At

I have a dataset in pandas (say two class).

 index | length | weight | label 
-------|--------|--------|-------
   0       1         2       0
   1       2         3       0
   2      nan        4       0
   3       6        nan      0
   4       30        40      1
   5       45        35      1
   6       18       nan      1

df.fillna(df.mean()) returns a dataframe which each nan is filled by mean of each column. But I want to fill each nan in each column with mean of its class so length at index 2 would be 3. Output is like this:

 index | length | weight | label 
-------|--------|--------|-------
   0       1         2       0
   1       2         3       0
   2       3         4       0
   3       6         3       0
   4       30        40      1
   5       45        35      1
   6       18       37.5     1

Is there a simple function or I should implement it myself?

1

There are 1 answers

0
jezrael On BEST ANSWER

Use GroupBy.transform with mean for helper Dataframe with means per groups and pass to fillna:

df = df.fillna(df.groupby('label').transform('mean')) 
print (df)
   length  weight  label
0     1.0     2.0      0
1     2.0     3.0      0
2     3.0     4.0      0
3     6.0     3.0      0
4    30.0    40.0      1
5    45.0    35.0      1
6    18.0    37.5      1 

Detail:

print (df.groupby('label').transform('mean'))
   length  weight
0     3.0     3.0
1     3.0     3.0
2     3.0     3.0
3     3.0     3.0
4    31.0    37.5
5    31.0    37.5
6    31.0    37.5