Imputation by Class Average

34 views Asked by At

I have a dataset with several variables that have a lot of missing data. I want to do an imputation by class considering a variable that has 3 categories as its level. I would like to be able to make a code that does this imputation for all variables automatically without having to do it by hand one by one.

medie_per_classe <- aggregate(. ~ Company_category, data = df, 
    FUN = mean, na.rm = TRUE)

I tried with this code, but i obtain only missing value- The code does not compute the mean

1

There are 1 answers

6
Gregor Thomas On

With dplyr, this will fill in all NAs with the mean grouped by Company_category, for all numeric columns in your_data

library(dplyr)

your_data |>
  mutate(across(
      where(is.numeric),
      \(x) coalesce(x, mean(x, na.rm = TRUE))
    ),
    .by = Company_category
  )