%dopar% or alternative method to speed up sequential stochastic calculation

287 views Asked by At

I have written a stochastic process simulator but I would like to speed it up since it's pretty slow.

The main part of the simulator is made of a for loop which I would like to re-write as a foreach with `%dopar%.

I have tried doing so with a simplified loop but I'm running into some problems. Suppose my for loop looks like this

library(foreach)

r=0
t<-rep(0,500)
for(n in 1:500){
    s<-1/2+r
    u<-runif(1, min = 0, max = 1)
    if(u<s){
        t[n]<-u
        r<-r+0.001
    }else{r<-r-0.001}
}

which means that at each iteration I update the value of r and s and, in one of the two outcomes, populate my vector t. I have tried several different ways of re-writing it as a foreach loop but it seems like with each iteration my values don't get updated and I get some pretty strange results. I have tried using return but it doesn't seem to work!

This is an example of what I have come up with.

rr=0
tt<-foreach(i=1:500, .combine=c) %dopar% {
    ss<-1/2+rr
    uu<-runif(1, min = 0, max = 1)
    if(uu<=ss){
        return(uu)
        rr<-rr+0.001
    }else{
        return(0)
        rr<-rr-0.001}
}

If it is impossible to use foreach what other way is there for me to re-write the loop so to be able to use all cores and speed up things?

2

There are 2 answers

1
alexis_laz On BEST ANSWER

Since your comments, about turning to C, were encouraging and -mostly- to prove that this isn't a hard task (especially for such operations) and it's worth looking into, here is a comparison of two sample functions that accept a number of iterations and perform the steps of your loop:

ffR = function(n) 
{
   r = 0
   t = rep(0, n)
   for(i in 1:n) {
       s = 1/2 + r
       u = runif(1)
       if(u < s) {
           t[i] = u
           r = r + 0.001
       } else r = r - 0.001
   }

   return(t)
}


ffC = inline::cfunction(sig = c(R_n = "integer"), body = '
    int n = INTEGER(AS_INTEGER(R_n))[0];

    SEXP ans;
    PROTECT(ans = allocVector(REALSXP, n));

    double r = 0.0, s, u, *pans = REAL(ans);

    GetRNGstate();

    for(int i = 0; i < n; i++) {
        s = 0.5 + r;
        u = runif(0.0, 1.0);

        if(u < s) {
            pans[i] = u;
            r += 0.001;
        } else {
            pans[i] = 0.0;
            r -= 0.001;
        }
    }

    PutRNGstate();

    UNPROTECT(1);
    return(ans);
', includes = "#include <Rmath.h>")

A comparison of results:

set.seed(007); ffR(5)
#[1] 0.00000000 0.39774545 0.11569778 0.06974868 0.24374939
set.seed(007); ffC(5)
#[1] 0.00000000 0.39774545 0.11569778 0.06974868 0.24374939

A comparison of speed:

microbenchmark::microbenchmark(ffR(1e5), ffC(1e5), times = 20)
#Unit: milliseconds
#       expr        min         lq     median         uq        max neval
# ffR(1e+05) 497.524808 519.692781 537.427332 668.875402 692.598785    20
# ffC(1e+05)   2.916289   3.019473   3.133967   3.445257   4.076541    20

And for the sake of completeness:

set.seed(101); ans1 = ffR(1e5)
set.seed(101); ans2 = ffC(1e5)
all.equal(ans1, ans2)
#[1] TRUE

Hope any of this could be helpful in some way.

0
OganM On

What you are trying to do, since every iteration is dependent on the previous steps of the loop, doesn't seem to be parallelizable. You are updating the variable r and expecting other branches that are running simultaneously to know about it, and in fact wait for the update to happen, which
1) Doesn't happen. They won't wait, they'll just take r's current value whatever that is at the time they are running
2) If it did it would be same as running it without %dopar%