I have a mixed type data set, so I wanted to try kamila
clustering. It is easy to apply it, but I would like a plot to decide the number of clusters similar to knee-plot.
data <- read.csv("binarymat.csv",header=FALSE,sep=";")
conInd <- c(9)
conVars <- data[,conInd]
conVars <- data.frame(scale(conVars))
catVarsFac <- data[,c(1,2,3,4,5,6,7,8)]
catVarsFac[] <- lapply(catVarsFac, factor)
catVarsDum <- dummyCodeFactorDf(catVarsFac)
kamRes <- kamila(conVars, catVarsFac, numClust=5, numInit=10,
calcNumClust = "ps",numPredStrCvRun = 10, predStrThresh = 0.5)
summary(kamRes)
It says that the best number of clusters is 5. How does it decide that and can I see a plot indicating this?
In the
kamila
package documentationIn the case, you are using it, you have specified only one value to
numClust
. So, it doesn't look like you are actually selecting the number of clusters - you have already picked one.To select the number of clusters, you have to specify the range you are interested in, for example,
numClust = 2 : 7
and also the method for selecting the number of clusters.If you also want to select the number of clusters, something like the following might work.
Information on the selection of the number of clusters is now present in
kamRes$nClust
, andplot(2:7, kamRes$nClust$psValues)
could be what you are after.