Make program based on kernlab function run faster

141 views Asked by At

For a range I want to identify where changes in distribution happens and where that value is the maximum. Currently I am using a kernel maximum discrepancy test for every value in the range and I am taking the 200 values before and after that value then I extract the locations where the mmd statistic is maximum. But this is very computationally intensive to calculate in R. Please note that I am using kernlab to calculate kmmd. I want to know if there is a way to do this faster? Or if you have any suggestions. Any help would be appreciated.

My code is:

    cvg<-seq(1,2000)
    cvg<-cvg^3-2*cvg^2+5*cvg
    myRange<-seq(400:(length(cvg)-400))
    kernel<-"splinedot"
    cvg[201:(length(cvg)-200)]->cvg
    myRange<-seq(400:(length(cvg)-400))
    lapply(myRange, function(x) mmdstats(kmmd((as.matrix(cvg[(x+1):(x+400)])), (as.matrix(cvg[(x+801):(x+1200)])), kernel=kernel)))->kmm.ls
    as.data.frame(as.matrix(kmm.ls))->kmm.ls
    lapply(kmm.ls, function(x) which.max(mmdstats(x)))->store.max
1

There are 1 answers

0
Elia On BEST ANSWER

I state that I am not an expert on the subject with kernlab so I can not judge the correctness of your analysis or improve your code. However, I can suggest you convert your lapply call to a parallelized version such as sfLapply, parLapply, mclapply future_lapply ecc. Here I post an example with sfLapply from the snowfall package(which is really straightforward imo):

#your original lapply call took 500 seconds on my PC
system.time(kmm.ls <- lapply(myRange, function(x) mmdstats(kmmd((as.matrix(cvg[(x+1): 
(x+400)])), (as.matrix(cvg[(x+801):(x+1200)])), kernel=kernel))))


library(kernlab)
library(snowfall)
sfInit(parallel=TRUE,cpus = parallel::detectCores()-1)
# Load the required packages inside the cluster
sfLibrary(kernlab)
#export all variable in all the cluster
sfExportAll()
# Run parallelized lapply with custom function  
#sfLapply took 22 second on my 48 cores PC
system.time(kmm.ls <- sfLapply(myRange, function(x) 
mmdstats(kmmd((as.matrix(cvg[(x+1):(x+400)])), (as.matrix(cvg[(x+801):(x+1200)])), 
kernel=kernel))))
#stop cluster
sfStop()

This is an example with only the first lapply call of your code, but the same idea can be applied to the second call (when I tried to run your code, the second lapply call give me an error)

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘mmdstats’ for signature ‘"list"’

It doesn't seem a critical error but as I said I don't feel prepared to advise how to fix it.