How to decide best number of clusters for kamila clustering with R?

Question

How to decide best number of clusters for kamila clustering with R?

1.6k views Asked by Emrah BILGIC At 24 May 2018 at 23:36

I have a mixed type data set, so I wanted to try kamila clustering. It is easy to apply it, but I would like a plot to decide the number of clusters similar to knee-plot.

data <- read.csv("binarymat.csv",header=FALSE,sep=";")
conInd <- c(9)
conVars <- data[,conInd]
conVars <- data.frame(scale(conVars))
catVarsFac <- data[,c(1,2,3,4,5,6,7,8)]
catVarsFac[] <- lapply(catVarsFac, factor)
catVarsDum <- dummyCodeFactorDf(catVarsFac)
kamRes <- kamila(conVars, catVarsFac, numClust=5, numInit=10,
            calcNumClust = "ps",numPredStrCvRun = 10, predStrThresh = 0.5)
summary(kamRes)

It says that the best number of clusters is 5. How does it decide that and can I see a plot indicating this?

Original Q&A

There are 1 answers

**kangaroo_cliff** · Accepted Answer · 2018-05-25T00:24:42+00:00

In the kamila package documentation

Setting calcNumClust to ’ps’ uses the prediction strength method of Tibshirani & Walther (J. of Comp. and Graphical Stats. 14(3), 2005). There is no perfect method for estimating the number of clusters; PS tends to give a smaller number than, say, BIC based methods for large sample sizes.

In the case, you are using it, you have specified only one value to numClust. So, it doesn't look like you are actually selecting the number of clusters - you have already picked one.

To select the number of clusters, you have to specify the range you are interested in, for example, numClust = 2 : 7 and also the method for selecting the number of clusters.

If you also want to select the number of clusters, something like the following might work.

kamRes <- kamila(conVars, catVarsFac, numClust = 2 : 7, numInit = 10, 
          calcNumClust = "ps", numPredStrCvRun = 10, predStrThresh = 0.5)

Information on the selection of the number of clusters is now present in kamRes$nClust, and plot(2:7, kamRes$nClust$psValues) could be what you are after.

TechQA.

How to decide best number of clusters for kamila clustering with R?

There are 1 answers

Related Questions in R

Related Questions in CLUSTER-ANALYSIS

Related Questions in MIXED-TYPE

Popular Questions

Popular Tags

Trending Questions