Merge by rownames in data.table

430 views Asked by At

I wanted to use this solution, two merge two data.tables by row name. It does however not work.

z <- matrix(c(0,0,1,1,0,0,1,1,0,0,0,0,1,0,1,1,0,1,1,1,1,0,0,0,"RND1","WDR", "PLAC8","TYBSA","GRA","TAF"), nrow=6,
    dimnames=list(c("ILMN_1651838","ILMN_1652371","ILMN_1652464","ILMN_1652952","ILMN_1653026","ILMN_1653103"),c("A","B","C","D","symbol")))

tt <-matrix(c("GO:0002009", 8, 342, 1, 0.07, 0.679, 0, 0, 1, 0, 
        "GO:0030334", 6, 343, 1, 0.07, 0.065, 0, 0, 1, 0,
        "GO:0015674", 7, 350, 1, 0.07, 0.065, 1, 0, 0, 0), nrow=10, dimnames= list(c("GO.ID","LEVEL","Annotated","Significant","Expected","resultFisher","ILMN_1652464","ILMN_1651838","ILMN_1711311","ILMN_1653026")))

z <- as.data.frame(z)
tt <- as.data.frame(tt)

setDT(z)
setDT(tt)

merge(tt,z["symbol"],by="row.names",all.x=TRUE)

I get the error:

Error in `[.data.table`(z, "symbol") : 
  When i is a data.table (or character vector), the columns to join by must be specified using 'on=' argument (see ?data.table), by keying x (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing column names between x and i (i.e., a natural join). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM.

How would this work in data.table?

1

There are 1 answers

3
Merijn van Tilborg On

You can just merge two matrices with merge and the merged set will be a data.frame with the column name "Row.names". After that you can if desired make it a data.table.

merged <- merge(tt, z, by = "row.names", all = TRUE)

setDT(merged)

Or you can decide to convert the matrices to a data.table first and add the dimnames as a new column. Then merge those two data.tables.

merge(
  as.data.table(z)[, id := dimnames(z)[[1]]],
  as.data.table(tt)[, id := dimnames(tt)[[1]]],
  all = T
)