Number of Connected Nodes in a dendrogram

206 views Asked by At

Just started working with the tidygraph and ggraph packages recently and have a relatively simple problem, though, oddly, cannot seem to find an easy solution. Within a network, how many nodes are connected down from a single parent? Would seem to be a fairly simple question, but have struggled to arrive at an answer, especially when there are multiple "parent/child" relationships that need to be unfolded.

# reproducible example -----------------

library(tidygraph)
library(ggraph)
library(tidyverse)

parent_child <- tribble(
  ~parent, ~child,
        "a", "b",
        "b", "c",
        "b", "d",
        "d", "e",
        "d", "f",
        "d", "g",
        "g", "z"
)

# converted to a dendrogram ------------

parent_child %>% 
  as_tbl_graph() %>% 
  ggraph(layout = "dendrogram") +
  geom_node_point() +
  geom_node_text(aes(label = name),
                 vjust = -1,
                 hjust = -1) +
  geom_edge_elbow()

This result is a network enter image description here

What I want to know; how many nodes are connected to point "b" when moving out/down (ignoring node "a")? The answer I would expect is 6, or, including "b", then 7.

I am running this over a network of about 5000 individuals, so filtering individual nodes by name is not a great solution. No one else in my office is familiar with network analyses, so have been kind of left on my own to figure this out. Really hope someone has an insight! In the mean time will keep reviewing the problem and possible solutions :) Thank y'all!

1

There are 1 answers

3
lotus On BEST ANSWER

One way would be to use ego_size() from the igraph package. It needs an order parameter but you could use the edge count to capture the maximum possible order of the neighborhood.

library(igraph)

g <- graph_from_data_frame(parent_child)

ego_size(g, order = ecount(g), nodes = "b", mode = "out", mindist = 1)

[1] 6

For multiple nodes, just pass a vector of the nodes of interest:

ego_size(g, order = ecount(g), nodes = c("b", "d", "g"), mode = "out", mindist = 1)

[1] 6 4 1