I am working with data.tables in R. The data has multiple records by id and I am trying to find the nth record for each individual using the .SD data.table option. If I specify N as an integer, the new data.table is created instantaneously. But if N is a variable (as it might be in a function), the code takes about 700 times longer. With large data sets, this is a problem. I was wondering if this is a known issue, and if there is any way to speed this up?
library(data.table) library(microbenchmark) set.seed(102938) dd <- data.table(id = rep(1:10000, each = 10), seq = seq(1:10)) setkey(dd, id) N <- 2 microbenchmark(dd[,.SD, keyby = id], dd[,.SD[N], keyby = id], times = 5)
#> Unit: microseconds #> expr min lq mean median #> dd[, .SD, keyby = id] 886.269 1584.513 2904.497 1851.356 #> dd[, .SD[N], keyby = id] 770822.875 810131.784 870418.622 903956.708 #> uq max neval #> 1997.134 8203.214 5 #> 912223.026 954958.718 5