How can I melt columns while keeping two together?

2.9k views Asked by At

I have data of this wide format that I want to convert to long format

    Cond    Construct   Line    Plant   Tube_shoot  weight_shoot    Tube_root   weight_root
1   Standard            NA      NA      2           199.95          -           -
2   Cd0     IIF         43.1    1       3           51.87           4           10.39
3   Cd0     IIF         43.1    2       5           81.80           6           15.05
4   Cd0     IIF         43.1    3       7           101.56          8           16.70

What I basically want is to keep Tube_shoot and weight_shoot together, i.e. treat these two columns as one for melt. But because I can only use

id.vars=c("Cond","Construct","Line","Plant")

the result is not quite what I want.

I have two (ugly) solutions so far:

  1. I melt twice, first by measure.vars=c("Tube_shoot", "Tube_root" ), then by weights, and later remove half of the rows where the result is simply wrong. This is not feasible for me because I have different lengths of data, and I'd always have to check if I'm taking the right rows out.

  2. I paste "tube" with "weight" into a new column, take out the others, melt them, and later take them apart again.

  3. Copy them one by one in excel. But with hundreds of lines I'd rather learn how to do it in R.

I'm sure that there is a better way.

What I want in the end:

    Cond    Construct   Line    Plant   Tube        weight
1   Standard            NA      NA      2           199.95
2   Cd0     IIF         43.1    1       3           51.87
3   Cd0     IIF         43.1    2       5           81.80
4   Cd0     IIF         43.1    3       7           101.56
2   Cd0     IIF         43.1    1       4           10.39
3   Cd0     IIF         43.1    2       6           15.05
4   Cd0     IIF         43.1    3       8           16.70
3

There are 3 answers

3
akrun On BEST ANSWER

You could try

 res <- reshape(df1, idvar=c('Cond', 'Construct', 'Line', 'Plant'),
              varying=5:8, direction='long', sep="_")

 res1 <-  res[res$weight!='-', -5]
 row.names(res1) <- NULL

 res1
 #      Cond Construct Line Plant Tube weight_shoot
 #1 Standard             NA    NA    2       199.95
 #2      Cd0       IIF 43.1     1    3        51.87
 #3      Cd0       IIF 43.1     2    5         81.8
 #4      Cd0       IIF 43.1     3    7       101.56
 #5      Cd0       IIF 43.1     1    4        10.39
 #6      Cd0       IIF 43.1     2    6        15.05
 #7      Cd0       IIF 43.1     3    8        16.70

data

 df1 <- structure(list(Cond = c("Standard", "Cd0", "Cd0", "Cd0"), 
  Construct = c("", "IIF", "IIF", "IIF"), Line = c(NA, 43.1, 43.1, 43.1),
  Plant = c(NA, 1L, 2L, 3L), Tube_shoot = c(2L, 3L, 5L, 7L), weight_shoot = 
  c(199.95,51.87, 81.8, 101.56), Tube_root = c("-", "4", "6", "8"), 
  weight_root = c("-", "10.39", "15.05", "16.70")), .Names = c("Cond",
  "Construct", "Line", "Plant", "Tube_shoot", "weight_shoot", "Tube_root",
  "weight_root"), class = "data.frame", row.names = c("1", "2", "3", "4"))
7
A5C1D2H2I1M1N2O1R2T1 On

You may want to consider merged.stack from my "splitstackshape" package, with which you can do something like:

library(splitstackshape)
merged.stack(as.data.table(df1, keep.rownames = TRUE), 
             var.stubs = c("Tube", "weight"), sep = "_")
#    rn     Cond Construct Line Plant .time_1 Tube weight
# 1:  1 Standard             NA    NA    root    -      -
# 2:  1 Standard             NA    NA   shoot    2 199.95
# 3:  2      Cd0       IIF 43.1     1    root    4  10.39
# 4:  2      Cd0       IIF 43.1     1   shoot    3  51.87
# 5:  3      Cd0       IIF 43.1     2    root    6  15.05
# 6:  3      Cd0       IIF 43.1     2   shoot    5   81.8
# 7:  4      Cd0       IIF 43.1     3    root    8  16.70
# 8:  4      Cd0       IIF 43.1     3   shoot    7 101.56

Of course, you can also add a [Tube != "-" | weight != "-"] to the end to remove the rows where "Tube" or "weight" have "-"... but note that doing so doesn't magically convert those columns to numeric :-)

0
talat On

Another option, using dplyr and tidyr:

library(dplyr)
libarary(tidyr)

gather(df1, x, Tube, c(Tube_shoot, Tube_root)) %>% 
   mutate(weight = ifelse(grepl("*root$", x), weight_root, weight_shoot)) %>%
   select(-c(weight_shoot, weight_root, x))

#      Cond Construct Line Plant Tube weight
#1 Standard             NA    NA    2 199.95
#2      Cd0       IIF 43.1     1    3  51.87
#3      Cd0       IIF 43.1     2    5   81.8
#4      Cd0       IIF 43.1     3    7 101.56
#5 Standard             NA    NA    -      -
#6      Cd0       IIF 43.1     1    4  10.39
#7      Cd0       IIF 43.1     2    6  15.05
#8      Cd0       IIF 43.1     3    8  16.70