I am wondering what is the fastest way of finding all rows in xts
object that are the same as one particular row
library(xts)
nRows <- 3
coreData <- data.frame(a=rnorm(nRows), b=rnorm(nRows), c=rnorm(nRows))
testXts1 <- xts(coreData, order.by=as.Date(1:nRows))
testXts2 <- xts(coreData, order.by=as.Date((nRows + 1):(2*nRows)))
testXts3 <- xts(coreData, order.by=as.Date((2*nRows + 1):(3*nRows)))
testXts <- rbind(testXts1, testXts2, testXts3)
> testXts
a b c
1970-01-02 -0.3288756 1.441799 1.321608
1970-01-03 -0.7105016 1.639239 -2.056861
1970-01-04 0.1138675 -1.782825 -1.081799
1970-01-05 -0.3288756 1.441799 1.321608
1970-01-06 -0.7105016 1.639239 -2.056861
1970-01-07 0.1138675 -1.782825 -1.081799
1970-01-08 -0.3288756 1.441799 1.321608
1970-01-09 -0.7105016 1.639239 -2.056861
1970-01-10 0.1138675 -1.782825 -1.081799
rowToSearch <- first(testXts)
> rowToSearch
a b c
1970-01-02 -0.3288756 1.441799 1.321608
indicesOfMatchingRows <- unlist(apply(testXts, 1, function(row) lapply(1:NCOL(row), function(i) row[i] == coredata(rowToSearch[, i]))))
testXts[indicesOfMatchingRows, ]
a b c
1970-01-02 -0.3288756 1.441799 1.321608
1970-01-05 -0.3288756 1.441799 1.321608
1970-01-08 -0.3288756 1.441799 1.321608
I am sure this can be done in more elegant and fast way.
A more general question is how you say in R "I have this row matrix[5, ] how can I find (indexes of) other rows in matrix that are the same as matrix[5, ]".
How to do this in data.table
?
Since you said that speed is your main concern, you can get speedups even over a data.table solution with Rcpp:
Here's a comparison on a fairly large instance (with 1 million rows):
This benchmark assumes the object has been converted to a data frame (~4 seconds overhead) before calling the
roland.dt
and thatcompareToRows
has been compiled (~3 seconds overhead) before callingjosilber
. The Rcpp solution is about 300x faster than the base R solution and about 4x faster than the data.table solution in median runtime. The approach based ondigest
was not competitive, taking more than 60 seconds to execute each time.