parallel k-means in R

Question

parallel k-means in R

5.1k views Asked by hema At 06 December 2013 at 05:49

I am trying to understand how to parallelize some of my code using R. So, in the following example I want to use k-means to cluster data using 2,3,4,5,6 centers, while using 20 iterations. Here is the code:

library(parallel)
library(BLR)

data(wheat)

parallel.function <- function(i) {
    kmeans( X[1:100,100], centers=?? , nstart=i )
}

out <- mclapply( c(5, 5, 5, 5), FUN=parallel.function )

How can we parallel simultaneously the iterations and the centers? How to track the outputs, assuming I want to keep all the outputs from k-means across all, iterations and centers, just to learn how?

Original Q&A

There are 3 answers

korolevbin On 22 May 2014 at 21:41

You may use parallel to try K-Means from different random starting points on multiple cores.

The code below is an example. (K=K in K-means, N= number of random starting points, C = number of cores you would like to use)

suppressMessages( library("Matrix") )
suppressMessages( library("irlba") )
suppressMessages( library("stats") )
suppressMessages( library("cluster") )
suppressMessages( library("fpc") )
suppressMessages( library("parallel") )

#Calculate KMeans results
calcKMeans <- function(matrix, K, N, C){
  #Parallel running from various of random starting points (Using C cores)
  results <- mclapply(rep(N %/% C, C), FUN=function(nstart) kmeans(matrix, K, iter.max=15, nstart=nstart), mc.cores=C);
  #Find the solution with smallest total within sum of square error
  tmp <- sapply(results, function(r){r[['tot.withinss']]})
  km <- results[[which.min(tmp)]]  
  #return cluster, centers, totss, withinss, tot.withinss, betweenss, size
  return(km)
}

runKMeans <- function(fin_uf, K, N, C, 
                      #fout_center, fout_label, fout_size, 
                      fin_record=NULL, fout_prediction=NULL){
  uf = read.table(fin_uf)
  km = calcKMeans(uf, K, N, C)
  rm(uf)
  #write.table(km$cluster, file=fout_label, row.names=FALSE, col.names=FALSE)
  #write.table(km$center, file=fout_center, row.names=FALSE, col.names=FALSE)
  #write.table(km$size, file=fout_size, row.names=FALSE, col.names=FALSE)
  str(km)

  return(km$center)
}

Hope it helps!

quine On 02 May 2018 at 21:27

There's a CRAN package called knor that is derived from a research paper that improves the performance using a memory efficient variant of Elkan's pruning algorithm. It's an order of magnitude faster than everything in these answers.

install.packages("knor")
require(knor)
iris.mat <- as.matrix(iris[,1:4])
k <- length(unique(iris[, dim(iris)[2]])) # Number of unique classes
nthread <- 4
kms <- Kmeans(iris.mat, k, nthread=nthread)

**Stephen Henderson** · Accepted Answer · 2013-12-06T13:59:39+00:00

This looked very simple to me at first ... and then i tried it. After a lot of monkey typing and face palming during my lunch break however, I arrived at this:

library(parallel)
library(BLR)

data(wheat)

mc = mclapply(2:6, function(x,centers)kmeans(x, centers), x=X)

It looks right though I didn't check how sensible the clustering was.

> summary(mc)
     Length Class  Mode
[1,] 9      kmeans list
[2,] 9      kmeans list
[3,] 9      kmeans list
[4,] 9      kmeans list
[5,] 9      kmeans list

On reflection the command syntax seems sensible - although a lot of other stuff that failed seemed reasonable too...The examples in the help documentation are maybe not that great.

Hope it helps.

EDIT As requested here is that on two variables nstart and centers

(pars = expand.grid(i=1:3, cent=2:4))

  i cent
1 1    2
2 2    2
3 3    2
4 1    3
5 2    3
6 3    3
7 1    4
8 2    4
9 3    4

L=list()
# zikes horrible
pars2=apply(pars,1,append, L)
mc = mclapply(pars2, function(x,pars)kmeans(x, centers=pars$cent,nstart=pars$i ), x=X)

> summary(mc)
      Length Class  Mode
 [1,] 9      kmeans list
 [2,] 9      kmeans list
 [3,] 9      kmeans list
 [4,] 9      kmeans list
 [5,] 9      kmeans list
 [6,] 9      kmeans list
 [7,] 9      kmeans list
 [8,] 9      kmeans list
 [9,] 9      means list

How'd you like them apples?

TechQA.

parallel k-means in R

There are 3 answers

Related Questions in R

Related Questions in PARALLEL-PROCESSING

Related Questions in PARALLEL-FOREACH

Popular Questions

Popular Tags

Trending Questions