I would like to plot the clusters in a dendrogram with the same colours than the silhouette plot and the cluster plot produced by factoextra
functions, as illustrated by the reproducible example below.
In addition, I did not find any easy way to add a legend and/or cluster labels on a dendrogram in ggplot2
style.
library(tidyverse)
library(ggpubr)
library(dendextend)
library(FactoMineR)
library(factoextra)
library(scales)
data(iris)
# Remove species column (5) and scale the data
iris.scaled <- scale(iris[, -5])
# Compute hcut and cut the tree
k <- 5
hc.cut <- hcut(iris.scaled, k = k, hc_method = "complete")
# Visualize silhouette information
fviz_silhouette(hc.cut)
#> cluster size ave.sil.width
#> 1 1 42 0.45
#> 2 2 7 0.57
#> 3 3 24 0.42
#> 4 4 66 0.25
#> 5 5 11 0.34
# Visualize clusters
fviz_cluster(hc.cut, ellipse.type = "convex")+
theme_minimal()
# Visualize dendrogram (wrong colours!)
fviz_dend(hc.cut, cex = 0.4)
#> Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
#> of ggplot2 3.3.4.
#> ℹ The deprecated feature was likely used in the factoextra package.
#> Please report the issue at <https://github.com/kassambara/factoextra/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
It turns out that the colours of the clusters are numbered from left to right and not according to the original cluster numbers.
I found a solution that requires to convert the hcut
object to a dendrogram
object,
to reorder the cluster numbers and to modify the palette in the fviz_dend
function:
# Convert hcut object to dendrogram object
dend <- as.dendrogram(hc.cut)
# Cluster membership in the dendrogram order
clusters <- hc.cut$cluster[order.dendrogram(dend)]
# Rebuild palette of ordered colours
show_col(hue_pal()(5))
cols <- hue_pal()(5)[unique(clusters)]
names(cols) <- unique(clusters)
# As there is no aesthetic for clusters to retrieve from fviz_dend(),
# this solution does not work:
fviz_dend(dend, k = k, cex = 0.4) +
scale_colour_manual(values = cols,
aesthetics = c("colour", "fill"))
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
# Visualize dendrogram with correct colours (3 options)
fviz_dend(dend, k = k, palette = cols, cex = 0.4)
fviz_dend(hc.cut, palette = cols, cex = 0.4)
fviz_dend(hc.cut, k_colors = cols, cex = 0.4)
Is there any simpler way to force this function to use cluster numbers to colour branches of the tree?
In addition, I would like to add cluster labels in this ggplot as it is possible to do in a base plot using a dendextend
function:
# Base plot with cluster labels
dend %>%
color_branches(k = 5,
col = cols,
groupLabels = unique(clusters)) %>%
plot()
Looking in the source code of the
fviz_*
functions you will observe thatfviz_silhouette
will use thecluster
slot to color the bars, whilefviz_dend
first transforms yourhclust
into andendrogram
and then colors from left to right, but the groups of the dendrogram are - as you observed - not in the order of clusters but determined byorder.dendrogram
.Thus, your only chance is to determine this order and reorder your palette accordingly.
Either in the
fviz_dend
as you have shown or infviz_silhouette
by adding+ scale_colour_manual(values = <some permutation of scales::hue_pal()(5)>, aesthetics = c("colour", "fill"))
.