Transforming Wide for Long Database, Grouping Variables in R

130 views Asked by At

I would like to make a modification to my database. What I am trying to do is transform every row of my database being for a single individual.

I want it to look like this. Example:

ID    massaseca Fator   Anexo     Teor  Tempo
1       334,68     AM       c      0       0
1       344,19     AM       c      0      10
1       347,32     AM       c      0      20 
1       350,2      AM       c      0      30 
1       352,52     AM       c      0      40
.        .          .       .      .       .
.        .          .       .      .       . 

Although I found some examples in the forum, I couldn't solve my problem. I will present both attempts. When I execute the code below, I find that it overrides some variables and is not the way you want.

####################################
require(reshape2)
long <- melt(dados1, id.vars = c("Teor", "Fator"))
melt(dados1, id.vars = 1:2)
melt(dados1, measure.vars = 4:45)
melt(dados1, measure.vars = as.character(10,20,30,
                                         40,50,60,70,80,
                                         90,105,120,135,150,
                                         170,190,210,230,250,
                                         270,290,310,330,350,
                                         360,405,450,510,570,
                                         630,690,750,810,870,
                                         930,990,1050,1110,1170,
                                         1230,1290,1350))

################# CASE 2 ###########
####################################
library(data.table)
long1 <- melt(setDT(dados1), id.vars = c("Teor", "Fator"), variable.name = "Tempo")
long1
melt(setDT(dados1), id.vars = 1:2, variable.name = "Tempo")
melt(setDT(dados1), measure.vars = 4:45, variable.name = "Tempo")
melt(setDT(dados1), measure.vars = as.character(10,20,30,
                                                  40,50,60,70,80,
                                                  90,105,120,135,150,
                                                  170,190,210,230,250,
                                                  270,290,310,330,350,
                                                  360,405,450,510,570,
                                                  630,690,750,810,870,
                                                  930,990,1050,1110,1170,
                                                  1230,1290,1350), variable.name = "Tempo")


However, with the example above it is filling one variable below another, how to do it?

1

There are 1 answers

0
jay.sf On BEST ANSWER

There seems to be an unambiguous "id" column missing, which you may substitute by cbinding the row numbers. Then you could use base reshape where we grap the values for the times argument from the column names. (Note that you need to add 1 to the varyings if you set id as first column as I did.)

res <- reshape(cbind(id=1:nrow(dados1), dados1), 
               varying=5:46, 
               v.names="massaseca",
               timevar="Tempo", 
               times=as.numeric(gsub("X", "", tail(names(dados1), -3))), 
               direction="long", sep="")

Result

head(res[order(res$id), ])
#      id Teor Fator Anexo Tempo massaseca
# 1.0   1    0    Am     c     0    334,68
# 1.10  1    0    Am     c    10    344,19
# 1.20  1    0    Am     c    20    347,32
# 1.30  1    0    Am     c    30     350,2
# 1.40  1    0    Am     c    40    352,52
# 1.50  1    0    Am     c    50    354,81