I am interested in visualizing the results of a hierarchical cluster analysis. Is it possible to use a dendrogram to display the names or labels of clusters (and subclusters) without displaying the original cases that went into the cluster analysis?
For example, this code applies a hierarchical cluster analysis to the mtcars dataset.
data("mtcars")
clust <- hclust(get_dist(mtcars, method = "pearson"), method = "complete")
plot(clust)
Let's say I cut the tree at 4 clusters and rename the clusters "sedan", "truck", "sportscar", and "van" (totally arbitrary labels).
clust1 <- cutree(clust,4)
clust1 <- dplyr::recode(clust1,
'1'='sedan',
'2'='truck',
'3'='sportscar',
'4'='van')
Is it possible to display a dendrogram which shows these four labels as the nodes on the bottom of the tree, suppressing the names of the original car names?
I am also interested in displaying subclusters within clusters in a similar way, but that may be outside the scope of this question. Bonus points if you can also give a suggestion for how to display subclusters within clusters in a dendrogram while suppressing the names of the original cases! :)
Thank you in advance!
Yes, you can do this. I do not understand your
get_distso I will illustrate using the ordinary distancedist.To cut off and display just the top of the tree, change it to a dendrogram and use
upper. But you need to know what to height to cut it at. That is in the structureclust.Since you want four branches, you can cut at any height between the third and fourth heights (from the end). I will use 213.
You can get the basic plot now with
plot(TreeTop), but it won't have the labels that you want. To change the labels, use the packagedendextendwhich offers a tool specifically to change the labels.