I am trying to understand how to parallelize some of my code using R. So, in the following example I want to use k-means to cluster data using 2,3,4,5,6 centers, while using 20 iterations. Here is the code:
library(parallel)
library(BLR)
data(wheat)
parallel.function <- function(i) {
kmeans( X[1:100,100], centers=?? , nstart=i )
}
out <- mclapply( c(5, 5, 5, 5), FUN=parallel.function )
How can we parallel simultaneously the iterations and the centers? How to track the outputs, assuming I want to keep all the outputs from k-means across all, iterations and centers, just to learn how?
This looked very simple to me at first ... and then i tried it. After a lot of monkey typing and face palming during my lunch break however, I arrived at this:
It looks right though I didn't check how sensible the clustering was.
On reflection the command syntax seems sensible - although a lot of other stuff that failed seemed reasonable too...The examples in the help documentation are maybe not that great.
Hope it helps.
EDIT As requested here is that on two variables
nstart
andcenters
How'd you like them apples?