In the process of creating some sampling functions for already aggregated data I found that table was rather slow on the size data I am working with. I tried two improvements, first an Rcpp function as follows
// [[Rcpp::export]]
IntegerVector getcts(NumericVector x, int m) {
IntegerVector cts(m);
int t;
for (int i = 0; i < x.length(); i++) {
t = x[i] - 1;
if (0 <= t && t < m)
cts[t]++;
}
return cts;
}
And then while trying to understand why table was rather slow I found it being based on tabulate. Tabulate works well for me, and is faster than the Rcpp version. The code for tabulate is at:
https://github.com/wch/r-source/blob/545d365bd0485e5f0913a7d609c2c21d1f43145a/src/main/util.c#L2204
With the key line being:
for(R_xlen_t i = 0 ; i < n ; i++)
if (x[i] != NA_INTEGER && x[i] > 0 && x[i] <= nb) y[x[i] - 1]++;
Now the key parts of tabulate and my Rcpp version seem pretty close (I have not bothered dealing with NA).
Q1: why is my Rcpp version 3 times slower?
Q2: how can I find out where this time goes?
I would very much appreciate knowing where the time went, but even better would be a good way to profile the code. My C++ skills are only so so, but this seems simple enough that I should (cross my fingers) have been able to avoid any silly stuff that would triple my time.
My timing code:
max_x <- 100
xs <- sample(seq(max_x), size = 50000000, replace = TRUE)
system.time(getcts(xs, max_x))
system.time(tabulate(xs))
This gives 0.318 for getcts and 0.126 for tabulate.
Your function calls a
length
method in each loop iteration. Seems compiler don't cache it. To fix this store size of the vector in a separate variable or use range based loop. Also note that we don't really need explicit missing values check because in C++ all comparisons involving aNaN
always returnfalse
.Let's compare performance:
tabulate1
is the original version.Benchmark results:
Without
NA
:With
NA
:The
tabulate4
function which uses an iterator also slower thantabulate
. We can improve it: