group unique rows in a dataframe by ID, and stack related values as new column

579 views Asked by At

I'm trying to transform a data frame in which individuals are grouped by family ID. Instead of:

FAMID1 ID1  Age  Sex  Q1  Q2 ......
21     1    18   M    T   4
21     2    21   F    F   2
22     1    13   F    T   3
22     2    16   F    F   1
23     1    18   M    T   3
23     2    18   M    T   3

I want to get:

FAMID  ID1  ID2  Age1  Age2  Sex1  Sex2 Q1.1. Q1.2....
21     1    2    18    21     M     F    T     F
22     1    2    13    16     F     F    T     F
23     1    2    18    18     M     M    T     T

I've tried using melt (reshape2) and mutate (ddply) but I have a few hundred variables and I can't get them all into the same line of the data frame. I don't want to sum variables like aggregate, I just want to stack one line in with another and change all of the variable names to reflect which member of the family it is.

2

There are 2 answers

2
joel.wilson On
reshape(direction = "wide", data=df, idvar = c("FAMID1"), 
        v.names = c("ID","Age","Sex","Q1","Q2"), timevar = "ID")

#   FAMID1 ID.1 Age.1 Sex.1 Q1.1 Q2.1 ID.2 Age.2 Sex.2  Q1.2 Q2.2
#1:     21    1    18     M TRUE    4    2    21     F FALSE    2
#2:     22    1    13     F TRUE    3    2    16     F FALSE    1
#3:     23    1    18     M TRUE    3    2    18     M  TRUE    3
0
user5249203 On

You can achieve this using data.table long to wide. various other examples are listed here.

library(data.table)
setDT(df)

if you have many columns, you can just do this

colnames(df)
cols<-colnames(df)[-1]
dcast(df, FAMID1 ~ ID1, value.var = cols)

output

   FAMID1 ID1.1_1 ID1.1_2 Age_1 Age_2 Sex_1 Sex_2  Q1_1  Q1_2  Q2_1 Q2_2
 1:   21     1       2      18    21     M     F   TRUE FALSE    4    2
 2:   22     1       2      13    16     F     F   TRUE FALSE    3    1
 3:   23     1       2      18    18     M     M   TRUE  TRUE    3    3