I have a large dataset with over 300 hundred variables. I would like to create a new column in that dataset (in R), based on a condition in another grouped variable.
a <- c("Yes", "No", "No", "No", "Yes", "No")
b <- c(1,1,1,2,2,2)
df <- data.frame("Infected" = a, "Household" = b)
For example, I would like to create a third column "Living in an infected household", that is "Yes", if anybody in that household is infected. So, from the simple example above, I would like the third column to contain 6 'Yes' values, because there is at least one infected person in each household.
I tried the following, and a few other examples, but to no avail.
df2 <- df %>%
group_by(Household) %>%
mutate(Infected_HH = case_when(('Yes' %in% Infected)) ~ 'Yes',
(!('Yes' %in% Infected) ~ 'No')) %>%
ungroup()
I also tried the following with the original data - again to no avail.
df2 <- Final_Raw_In %>%
group_by(Household_ID.x) %>%
mutate(Infected_household = case_when(
if_any(Infected_qPCR) == 'Yes' ~ "Yes",
if_any(Infected_qPCR) == 'No' ~ "No"))
The following attempt gave me the output I wanted in the actual columns, but it labelled the column with the function "case_when(any .......) and it returned only the arguments used in the transmute function.
a <- c(1,1,1,2,2,2)
b <- c("Yes", "No", "No", "No", "No", "No")
df <- data.frame("Infected" = b, "Household" = a)
df2 <- df %>%
group_by(Household) %>%
transmute(case_when(any(Infected == 'Yes') ~ "Yes",
TRUE ~ "No"))
Thanks