merge or mutate a summary (dplyr)

2.7k views Asked by At

I am always unsure how to retrieve a summary with dplyr.

Let us suppose I have a summary of individuals and households.

dta = rbind(c(1, 1, 45), 
  c(1, 2, 47), 
  c(2, 1, 24),
  c(2, 2, 26), 
  c(3, 1, 67), 
  c(4, 1, 20),
  c(4, 2, 21),
  c(5, 3, 7)
 ) 
dta = as.data.frame(dta)
colnames(dta) = c('householdid', 'id', 'age')

 householdid id age
           1  1  45
           1  2  47
           2  1  24
           2  2  26
           3  1  67
           4  1  20
           4  2  21
           4  3   7

Imagine I want to calculate the number of person in the household and the mean age by households and then re-use this information in the original dataset.

dta %>% 
  group_by(householdid) %>% 
  summarise( nhouse = n(), meanAgeHouse = mean(age) ) %>% 
  merge(., dta, all = T)

I am often using merge, but it is slow sometimes when the dataset is huge.
Is it possible to

mutate 

instead of

merge ? 
1

There are 1 answers

0
3pitt On
dta %>% group_by(householdid) %>% mutate( nhouse = n(), meanAgeHouse = mean(age) )