Creating a new column in R based on a condition from grouped variables in another column

35 views Asked by At

I have a large dataset with over 300 hundred variables. I would like to create a new column in that dataset (in R), based on a condition in another grouped variable.

a <- c("Yes", "No", "No", "No", "Yes", "No")
b <- c(1,1,1,2,2,2)
df <- data.frame("Infected" = a, "Household" = b)

For example, I would like to create a third column "Living in an infected household", that is "Yes", if anybody in that household is infected. So, from the simple example above, I would like the third column to contain 6 'Yes' values, because there is at least one infected person in each household.

I tried the following, and a few other examples, but to no avail.

df2 <- df %>%
  group_by(Household) %>%
  mutate(Infected_HH = case_when(('Yes' %in% Infected)) ~ 'Yes',
                          (!('Yes' %in% Infected) ~ 'No')) %>%
  ungroup()

I also tried the following with the original data - again to no avail.

df2 <- Final_Raw_In %>% 
  group_by(Household_ID.x) %>% 
  mutate(Infected_household = case_when(
    if_any(Infected_qPCR) == 'Yes' ~ "Yes",
    if_any(Infected_qPCR) == 'No' ~ "No"))

The following attempt gave me the output I wanted in the actual columns, but it labelled the column with the function "case_when(any .......) and it returned only the arguments used in the transmute function.

a <- c(1,1,1,2,2,2)
b <- c("Yes", "No", "No", "No", "No", "No")
df <- data.frame("Infected" = b, "Household" = a)

df2 <- df %>%
  group_by(Household) %>%
  transmute(case_when(any(Infected == 'Yes') ~ "Yes",
                      TRUE ~ "No"))

Thanks

0

There are 0 answers