How to summarize table in R, according to each ID

381 views Asked by At

So I have a table structured like this

id   V1    V2
101, 500,   1
101, 600,   1
102, 300,   0
102, 300,   0
102, 400,   0
102, 100,   1
103, 200,   0
103, 400,   0
104, 200,   1

And basically for each id, I want to calculate the mean of V1, and the sum of V2, so the new table should look like this

id   V1    V2
101, 550,  2
102, 275,  1
103, 400,  0
104, 200,  1

If anyone can help out I'd really appreciate that.

1

There are 1 answers

5
akrun On BEST ANSWER

We can use one of the aggregating functions for this kind of problems. Here, I use dplyr. We group_by 'id' and summarise the 'V1' and 'V2' columns with mean and sum of those corresponding columns.

library(dplyr)
df1 %>% 
    group_by(id) %>%
    summarise(V1=mean(V1, na.rm=TRUE), V2= sum(V2, na.rm=TRUE))
#   id  V1 V2
#1 101 550  2
#2 102 275  1
#3 103 300  0
#4 104 200  1

Or another option is data.table. We convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'id', we get the mean and sum of the columns.

library(data.table)
setDT(df1)[, list(V1=mean(V1, na.rm=TRUE), V2= sum(V2, na.rm=TRUE)), by = id]
#    id  V1 V2
#1: 101 550  2
#2: 102 275  1
#3: 103 300  0
#4: 104 200  1

Or using base R

do.call(rbind, by(df1, df1[1], FUN=function(x) 
      data.frame(id=x[1,1], V1= mean(x[,2], na.rm=TRUE), 
                            V2=sum(x[,3], na.rm=TRUE))))