I found a nice tutorial of self organizing map clustering in R in which, it is explained how to display your input data in the unit space (see below). In order to set up some rules for the labeling, I would like to compute the probability of each class in each neuron and plot it. Computing the probability is rather easy: take for each unit the number of observations of class i and divide it by the total number of observations in this unit. I end up with data.frame pc. Now I struggle to map this result, any clue on how to do it?
library(kohonen)
data(yeast)
set.seed(7)
yeast.supersom <- supersom(yeast, somgrid(8, 8, "hexagonal"),whatmap = 3:6)
classes <- levels(yeast$class)
colors <- c("yellow", "green", "blue", "red", "orange")
par(mfrow = c(3, 2))
plot(yeast.supersom, type = "mapping",pch = 1, main = "All", keepMargins = TRUE,bgcol = gray(0.85))
library(plyr)
pc <- data.frame(Var1=c(1:64))
for (i in seq(along = classes)) {
X.class <- lapply(yeast, function(x) subset(x, yeast$class == classes[i]))
X.map <- map(yeast.supersom, X.class)
plot(yeast.supersom, type = "mapping", classif = X.map,
col = colors[i], pch = 1, main = classes[i], add=TRUE)
# compute percentage per unit
v1F <- levels(as.factor(X.map$unit.classif))
v2F <- levels(as.factor(yeast.supersom$unit.classif))
fList<- base::union(v2F,v1F)
pc <- join(pc,as.data.frame(table(factor(X.map$unit.classif,levels=fList))/table(factor(yeast.supersom$unit.classif,levels=fList))*100),by = 'Var1')
colnames(pc)[NCOL(pc)]<-classes[i]
}
OKay guys here is a solution: Once I have computed the probability, it derives a color code from a defined gradient (
rbPal
). The gradient is defined by a upper and a lower bound and the shade of the colors are proportional to their interval. THis is done with the functionfindInterval
.