how to ignore groups with all NAs while imputing data

Question

how to ignore groups with all NAs while imputing data

48 views Asked by Puneet Sachdeva At 07 April 2023 at 20:45

I have a large panel data with 1000s of rows. I want to use group by (gvkey) and impute values for NAs but some groups have all NAs. I want to ignore those groups.

These lines of code give me what I seek

set.seed(123)  
fake_data <- data.frame(
  gvkey = rep(c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J"), each = num_years),
  year = rep(2010:2014, 10),
  dltt = rnorm(50))

for (gvkey in c("A", "B", "D", "E", "F", "G", "H", "I", "J")) {
  year_to_replace <- sample(c(2011, 2012, 2013), size = sample(2:3, 1), replace = FALSE)
  fake_data$dltt[fake_data$gvkey == gvkey & fake_data$year %in% year_to_replace] <- NA
}

fake_data <- fake_data %>%
  arrange(gvkey, year) %>%
  group_by(gvkey) %>%
  mutate(dltt_imputed = na.approx(dltt))

But I get an error if some group has all NAs

fake_data$dltt[fake_data$gvkey == "C"] <- NA

fake_data <- fake_data %>%
  arrange(gvkey, year) %>%
  group_by(gvkey) %>%
  mutate(dltt_imputed = na.approx(dltt))

Please would someone help me add some conditions to the ongoing pipe to ignore such groups

Original Q&A

There are 1 answers

**SAL** · Answer 1 · 2023-04-07T22:53:22+00:00

One option is to provide a condition for mutate() to ignore groups with complete missing values (here, group C), and approximate missing values using corresponding non-missing values of related group. Since the num-years is not provided in your question, I assume it as num_years=5 based on the total number of values (50).

library(zoo)
library(tidyverse)

num_years <- 5
fake_data$dltt[fake_data$gvkey == "C"] <- NA

fake_data <- fake_data %>%
  arrange(gvkey, year) %>%
  group_by(gvkey) %>%
  mutate(dltt_imputed = ifelse(sum(is.na(dltt)) == num_years | !(is.na(dltt)), dltt,  na.approx(dltt)))

Note that the new imputed column would contain group C and doesn't exclude such groups with complete missing values. So I would leave this to QO how to proceed later with complete missing groups in the new imputed column.

TechQA.

how to ignore groups with all NAs while imputing data

There are 1 answers

Related Questions in R

Related Questions in DPLYR

Related Questions in NA

Related Questions in ZOO

Related Questions in IMPUTETS

Popular Questions

Trending Questions