I've inherited R some code and it runs incredibly slowly. Most of the time is spent evaluating the functions of the form (there are about 15 such functions with different integrands G):
TMin <- 0.5
F <- function (t, d) {
result <- ifelse(((d > 0) & (t > TMin)),
mapply(function(t, d) integrate(G, lower=0, upper=t, t, d)$value, t, d),
0)
return(result)
}
For testing, I'm using the following dummy function, but in the real code the Gs are much more complicated involving exp(), log(), dlnorm(), plnorm() etc..
G <- function(x, t, d) {
mean(rnorm(1e5))
x + t - d
}
F will be calculated around 2 million times in the worst case.
The function gets called in 3 different ways, either:
t is a single number and d is a numeric vector or,
t is a numeric vector and d is a single number or,
t is a numeric vector and is a numeric vector
Is there a (simple) way to speed up this function?
For far I've tried variations along the lines of (to get rid of the ifelse loop):
F2 <- function (t,d) {
TempRes <- mapply(function(t, d) integrate(G, lower=0, upper=t, t, d)$value, t, d)
TempRes[(d <= 0) | (t <= TMin)] <- 0
result <- TempRes
return(result)
}
and
F3 <- function (t,d) {
result <- rep(0, max(length(t),length(d)))
test <- ((d > 0) & (t > TMin))
result[test] <- mapply(function(t, d) integrate(G, lower=0, upper=t, t, d)$value, t, d)[test]
return(result)
}
but they take almost exactly the same time.
You are performing a large number of independent integrations. You can speed things up by performing these integrations on separate cores simultaneously (if you have a multicore processor available). The problem is that R performs its calculations in a single threaded manner by default. However, there are a number packages available that allow multithreading support. I have recently answered a few similar questions here and here, with some additional info regarding the relevant packages and functions.
Additionally, As @Mike Dunlavey already mentioned, you should avoid performing the integrations for values of
t
andd
that do not match your criteria. (You are currently performing unneeded function evaluations for these values and then you overwrite the result with 0 afterwards).I have added a possible improvement below. Note that you will have to create a separate file with your function
G
included in order to evaluate it on the cluster nodes. In the code below it is assumed that this file is calledfunctionG.R
The snippet:
My CPU contains 6 physical cores with hyperthreading enabled (x2). These are the results:
It seems that the cores are constantly in use while running this code. However, you can potentially optimize this code further by presplitting the data more efficiently around the cores and then subsequently utilize an apply type functions on the separate cores.
If more optimization is required you could also take a deeper look at the
integrate
function. You can potentially play around with the settings and obtain a performance gain by allowing a less strict numerical approximation. As an alternative you could implement your own simple version of adaptive Simpson quadrature and play around with the discrete stepsizes. Most likely you could obtain massive performance increases like this (if you are able/willing to allow more error in the approximation).EDIT: Updated code in order for it to work in all scenario's:
d
and/ort
valid/invalid numbers or vectorsREPLY TO COMMENTS @mawir: you are correct.
ifelse(test, yes, no)
will return correspondingyes
values for the rows of which test evaluates toTRUE
, it will return the respectiveno
values for the rows for whichtest
evaluate toFALSE
. However, it WILL first have to evaluate youryes
expression in order to create theyes
vector oflength(test)
. This piece of code demonstrates this:Only the last 5 values of
t
andd
are relevant in this scenario. However, inside theF1
function theifelse
evaluates themapply
on alld
andt
values first in order to create theyes
vector. This is why the function execution takes so long. Next, it selects the elements for which the conditions are met, or 0 otherwise. TheF4
function works around this issue.Futhermore, you say that you obtain speedup in the scenario where
t
andd
are non-vectors. However, in this scenario no parallelisation is used. You should normally obtain the maximum speedup in the senario's where one or both oft
/d
are vectors.ANOTHER EDIT, in response to Roland's comment: You could potentially replace
clusterEvalQ(cl,eval(parse("functionG.R")))
withclusterExport(cl,"G")
if you prefer not to create a separate function file(s).