Parallel Bootstrap t-procedure for confidence bands

267 views Asked by At

I'm implementing a bootstrap-t procedure for confidence bands for a statistic. Here is my code:

 #Compute bootstrap variance
bt.var<-function(x,statistic,R=10000){
    var(replicate(R,statistic(sample(x,replace=T))))
} 

#Compute studentized bootstrap statistic
bt.one.student<-function(x, statistic.0, statistic,R=10000){
    (statistic(x)-statistic.0)/sqrt(bt.var(x,statistic,R))
}


#Compute 95% confidence bands
bt.student<-function(x,statistic,R1=10000,R2=10000){
    statistic.0<-statistic(x)
    one.boot<-function(x,statistic.0,statistic,R2){
        x.star<-sample(x,replace=T)
        theta.hat<-statistic(x.star)
        out<-bt.one.student(x.star,statistic.0,statistic,R2)
        c(theta.hat,out)
    }
    output<-replicate(R1, one.boot(x,statistic.0,statistic,R2))
    var.est<-var(output[1,])
    q<-quantile(output[2,], c(0.025, 0.975))
    c(statistic.0-sqrt(var.est)*q[2], statistic.0-sqrt(var.est)*q[1])
} 

Since I want to implement the function bt.student() using the parallel package to take advantage of multi-cores, I'm using the following code:

library(parallel)
cl<-makeCluster(detectCores())
bt.var<-function(x,statistic,R=10000){
          var(parSapply(cl, 1:R, function(i) statistic(sample(x,replace=T))))
}

bt.one.student<-function(x, statistic.0, statistic,R=10000){
    (statistic(x)-statistic.0)/sqrt(bt.var(x,statistic,R))
}

one.boot<-function(x,statistic.0,statistic,R2){
        x.star<-sample(x,replace=T)
        theta.hat<-statistic(x.star)
        out<-bt.one.student(x.star,statistic.0,statistic,R2)
        c(theta.hat,out)
    }

bt.student<-function(x,statistic,R1=10000,R2=10000){
    statistic.0<-statistic(x)
    output<-parSapply(cl, 1:R1, function(i) one.boot(x,statistic.0,statistic,R2) )
    var.est<-var(output[1,])
    q<-quantile(output[2,], c(0.025, 0.975))
    c(statistic.0-sqrt(var.est)*q[2], statistic.0-sqrt(var.est)*q[1])
}

clusterExport(cl, c("bt.var","bt.one.student","one.boot"))

clusterSetRNGStream(cl)

x<-rnorm(40,mean=3,sd=2)

clusterExport(cl, "x")

bt.student(x,mean,R1=150,R2=150)

I get the following error: Error in checkForRemoteErrors(val) : 4 nodes produced errors; first error: could not find function "parSapply"

Do you know why I get this error? I have to use parSapply since there is no parallel equivalent replicate in the parallel package.

2

There are 2 answers

0
Steve Weston On BEST ANSWER

It looks like you're trying to use nested parallelism, which is rather tricky to do, and often isn't necessary. To make your example work, you'd have to create a cluster object on each worker, but then you'll have way too many workers which could horribly bog down your machine.

I suggest that you revert "bt.var" to the original sequential version, and only use "parSapply" in "bt.student". That gives you 10,000 good sized tasks, which should work well and make good use of your cores.

3
Oliver Keyes On

Because the newly-spawned R processes are newly-spawned - i.e., they're default processes. That means that they don't have the parallel package loaded locally.

Try adding clusterEvalQ(cl, library(parallel)).