How to assign consistent colours to clusters with factoextra and dendextend R functions?

106 views Asked by At

I would like to plot the clusters in a dendrogram with the same colours than the silhouette plot and the cluster plot produced by factoextra functions, as illustrated by the reproducible example below.
In addition, I did not find any easy way to add a legend and/or cluster labels on a dendrogram in ggplot2 style.


library(tidyverse)
library(ggpubr)
library(dendextend)
library(FactoMineR)
library(factoextra)
library(scales)

data(iris)

# Remove species column (5) and scale the data
iris.scaled <- scale(iris[, -5])
# Compute hcut and cut the tree
k <- 5
hc.cut <- hcut(iris.scaled, k = k, hc_method = "complete")

# Visualize silhouette information
fviz_silhouette(hc.cut)
#>   cluster size ave.sil.width
#> 1       1   42          0.45
#> 2       2    7          0.57
#> 3       3   24          0.42
#> 4       4   66          0.25
#> 5       5   11          0.34

Silhouette plot

# Visualize clusters
fviz_cluster(hc.cut, ellipse.type = "convex")+
  theme_minimal()

Cluster plot

# Visualize dendrogram (wrong colours!)
fviz_dend(hc.cut, cex = 0.4)
#> Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
#> of ggplot2 3.3.4.
#> ℹ The deprecated feature was likely used in the factoextra package.
#>   Please report the issue at <https://github.com/kassambara/factoextra/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.

Dendrogram1

It turns out that the colours of the clusters are numbered from left to right and not according to the original cluster numbers.
I found a solution that requires to convert the hcut object to a dendrogram object, to reorder the cluster numbers and to modify the palette in the fviz_dend function:


# Convert hcut object to dendrogram object
dend <- as.dendrogram(hc.cut)
# Cluster membership in the dendrogram order
clusters <- hc.cut$cluster[order.dendrogram(dend)]

# Rebuild palette of ordered colours
show_col(hue_pal()(5))

x

cols <- hue_pal()(5)[unique(clusters)]
names(cols) <- unique(clusters)

# As there is no aesthetic for clusters to retrieve from fviz_dend(), 
# this solution does not work:
fviz_dend(dend, k = k,  cex = 0.4) + 
  scale_colour_manual(values = cols, 
                      aesthetics = c("colour", "fill"))
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.

y


# Visualize dendrogram with correct colours (3 options)
fviz_dend(dend, k = k, palette = cols, cex = 0.4)

z


fviz_dend(hc.cut, palette = cols, cex = 0.4)

a


fviz_dend(hc.cut, k_colors = cols, cex = 0.4)

b

Is there any simpler way to force this function to use cluster numbers to colour branches of the tree?

In addition, I would like to add cluster labels in this ggplot as it is possible to do in a base plot using a dendextend function:


# Base plot with cluster labels
dend %>% 
  color_branches(k = 5, 
                 col = cols, 
                 groupLabels = unique(clusters)) %>% 
  plot()

dd

1

There are 1 answers

2
thothal On

Looking in the source code of the fviz_* functions you will observe that fviz_silhouette will use the cluster slot to color the bars, while fviz_dend first transforms your hclust into an dendrogram and then colors from left to right, but the groups of the dendrogram are - as you observed - not in the order of clusters but determined by order.dendrogram.

Thus, your only chance is to determine this order and reorder your palette accordingly.

Either in the fviz_dend as you have shown or in fviz_silhouette by adding + scale_colour_manual(values = <some permutation of scales::hue_pal()(5)>, aesthetics = c("colour", "fill")).