Add new column as result of a condition between groups in dplyr

775 views Asked by At

I need to know if a person belong to a unique group or several groups and add a new column with boolean values that describe this condition.

Example data:

df <- structure(list(group = c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 1L, 2L, 
1L, 3L), person = c(955563L, 955563L, 855563L, 855563L, 744506L, 
744506L, 744506L, 444506L, 444506L, 555563L, 555563L)), .Names = c("group", 
"person"), row.names = c(NA, -11L), class = "data.frame")

Result:

group   person  same_group
1   955563  TRUE
1   955563  TRUE
2   855563  TRUE
2   855563  TRUE
3   744506  TRUE
3   744506  TRUE
3   744506  TRUE
1   444506  FALSE
2   444506  FALSE
1   555563  FALSE
3   555563  FALSE

I think some window functions with dplyr can make it but I cannot figure out. Thanks in advance.

3

There are 3 answers

1
akrun On BEST ANSWER

Try

library(dplyr)
df %>% 
   group_by(person) %>%
   mutate(same_group=n_distinct(group)==1)
#    group person same_group
#1      1 955563       TRUE
#2      1 955563       TRUE
#3      2 855563       TRUE
#4      2 855563       TRUE
#5      3 744506       TRUE
#6      3 744506       TRUE
#7      3 744506       TRUE
#8      1 444506      FALSE
#9      2 444506      FALSE
#10     1 555563      FALSE
#11     3 555563      FALSE

A similar option using data.table is

library(data.table)#v1.9.5+
setDT(df)[, same_group := uniqueN(group)==1 , by = person]
1
agstudy On

Another data.table option , suing ifelse and unique:

setDT(df)[,same_group:= ifelse(length(unique(group))==1,TRUE,FALSE),person]

#    group person same_group
# 1:     1 955563       TRUE
# 2:     1 955563       TRUE
# 3:     2 855563       TRUE
# 4:     2 855563       TRUE
# 5:     3 744506       TRUE
# 6:     3 744506       TRUE
# 7:     3 744506       TRUE
# 8:     1 444506      FALSE
# 9:     2 444506      FALSE
# 10:     1 555563      FALSE
# 11:     3 555563      FALSE
0
Eric On
df %>% group_by(group, person) %>% mutate(same_group = n() > 1) 

This would lead to slightly different output from the currently accepted answer, but it's not clear from your example what your desired output is. Example:

> df <- data_frame(group = c(1, 1, 2), person = c(123, 123, 123))
> df %>% group_by(group, person) %>% mutate(same_group = n() > 1) 
Source: local data frame [3 x 3]
Groups: group, person

  group person same_group
1     1    123       TRUE
2     1    123       TRUE
3     2    123      FALSE