Fastest way to replicate a data.frame

108 views Asked by At

I'm looking for the fastest to replicate a data.frame. Is there rep.data.frame that replicates rows? and what is the fastest way to achieve this for all inputs? as I have a function that needs to replicate an object x that can be a vector or a data.frame.

The code I'm currently using

repx <- function(x, ...) if(is.atomic(x)) rep(x, ...) else x[rep(1:nrow(x), ...),]

I used @ronak's answer to come up with a close enough solution although what I aim to achieve is to have a similar output to that of rep see output below:

rep.data.frame <- function(x, each, times) rbindlist(replicate(times, rbindlist(replicate(each, x, simplify = F) ), simplify = F) )

rep(data.frame(y=1:2), times=3, each=2)
    y
 1: 1
 2: 2
 3: 1
 4: 2
 5: 1
 6: 2
 7: 1
 8: 2
 9: 1
10: 2
11: 1
12: 2

# Desired output
    x
 1: 1
 2: 1
 3: 2
 4: 2
 5: 1
 6: 1
 7: 2
 8: 2
 9: 1
10: 1
11: 2
12: 2
1

There are 1 answers

0
edsandorf On

You can write a simple function that will repeat the rows of a data.frame or vector equal to the number of times specified (this is in fact very similar to what you are already doing. Note I couldn't get the rep.data.frame to give your desired output). A simple function could look like this:

rep_rows <- function(x, times) {
  if (is.matrix(x) | is.data.frame(x)) {
    x[rep(seq_len(nrow(x)), each = times), , drop = FALSE]
  } else {
    x[rep(seq_len(length(x)), each = times)]
  }
}

Let's create two objects to test the code:

db <- data.frame(
  y = rep(1:2, times = 3)
)

y <- rep(1:2, times = 3)

db looks like this:

> db
  y
1 1
2 2
3 1
4 2
5 1
6 2

and y looks like this:

> y
[1] 1 2 1 2 1 2

Using our function, we get:

> rep_rows(db, 2)
    y
1   1
1.1 1
2   2
2.1 2
3   1
3.1 1
4   2
4.1 2
5   1
5.1 1
6   2
6.1 2
> 

and

> rep_rows(y, 2)
 [1] 1 1 2 2 1 1 2 2 1 1 2 2

EDIT: Which when benchmarked on larger data is still quite quick. Curios to see how it compares to other approaches.

> db <- data.frame(
+   y = rep(1:5, times = 1000)
+ )
> microbenchmark::microbenchmark(rep_rows(db, 100))
Unit: milliseconds
              expr      min       lq    mean   median       uq      max neval
 rep_rows(db, 100) 259.0079 279.6223 294.129 285.9272 307.0718 349.6123   100