For a range I want to identify where changes in distribution happens and where that value is the maximum. Currently I am using a kernel maximum discrepancy test for every value in the range and I am taking the 200 values before and after that value then I extract the locations where the mmd statistic is maximum. But this is very computationally intensive to calculate in R. Please note that I am using kernlab to calculate kmmd. I want to know if there is a way to do this faster? Or if you have any suggestions. Any help would be appreciated.
My code is:
cvg<-seq(1,2000)
cvg<-cvg^3-2*cvg^2+5*cvg
myRange<-seq(400:(length(cvg)-400))
kernel<-"splinedot"
cvg[201:(length(cvg)-200)]->cvg
myRange<-seq(400:(length(cvg)-400))
lapply(myRange, function(x) mmdstats(kmmd((as.matrix(cvg[(x+1):(x+400)])), (as.matrix(cvg[(x+801):(x+1200)])), kernel=kernel)))->kmm.ls
as.data.frame(as.matrix(kmm.ls))->kmm.ls
lapply(kmm.ls, function(x) which.max(mmdstats(x)))->store.max
I state that I am not an expert on the subject with
kernlab
so I can not judge the correctness of your analysis or improve your code. However, I can suggest you convert yourlapply
call to a parallelized version such assfLapply
,parLapply
,mclapply
future_lapply
ecc. Here I post an example withsfLapply
from thesnowfall
package(which is really straightforward imo):This is an example with only the first
lapply
call of your code, but the same idea can be applied to the second call (when I tried to run your code, the secondlapply
call give me an error)It doesn't seem a critical error but as I said I don't feel prepared to advise how to fix it.