Getting leave names under branches for a given depth cutoff

195 views Asked by At

I'm wondering what's the way in which for a given depth cutoff in dendrogram I can get for each branch below that depth cutoff a list of the names of all the leaves which are its descendants.

For example I create this dendrogram:

set.seed(1)
mat <- matrix(rnorm(100*10),nrow=100,ncol=10)
dend <- as.dendrogram(hclust(dist(t(mat))))

Plotting it using dendextend:

require(dendextend)
dend %>% plot

And defining the depth cutoff as 14.5:

abline(h=14.5,col="red")

enter image description here

my list should be:

list(c(5),c(7),c(8),c(10,4,9),c(3,6,1,2))
2

There are 2 answers

3
SamPassmore On

Not entirely sure if this is the answer you are after, but can you just access them like this?

acme$Accounting$children %>% names()
"New Software"             "New Accounting Standards"

acme$IT$children %>% names()
"Outsource"   "Go agile"    "Switch to R"

Presumably you want to do this automatically so then it would be something like

names = c('Accounting', 'IT')
sapply(names, function(x) acme[[x]]$children %>% names(.))

There is probably a more elegant way to do this I think, but this doesn't look like a terrible way to do it.

EDIT

Since the user completely changed the question here is a new answer here:

get_height = function(x){
  a = attributes(x)
  a$height
}

height = 14
dendrapply(dend, function(x) ifelse(get_height(x) < height, x, '')) %>% unlist()

You just need to access the height of each terminal node in the dendrogram and determine if it is above or below the height you want it to be. Unfortunately this won't group together the leaf nodes that come from the same parent - however, this shouldn't be too difficult to add on with a bit of tinkering. Hopefully this gets you on your way.

0
dan On
set.seed(1)
mat <- matrix(rnorm(100*10),nrow=100,ncol=10)
dend <- as.dendrogram(hclust(dist(t(mat))))

require(dendextend)
dend %>% plot
abline(h=14.5,col="red")

The cutree function in dendextend accepts a height cutoff value and will return an integer vector with group memberships:

> cutree(dend,h=14.5)
 1  2  3  4  5  6  7  8  9 10 
 1  1  1  2  3  1  4  5  2  2