Get subgraph of shortest path between n nodes

2.9k views Asked by At

I have an unweighted graph and I want to get a subgraph that has just the nodes and edges that contain the shortest paths between n known nodes. In this case 3 nodes (11, 29, & 13 are the names).

Question

How can I get a subgraph of shortest path between n nodes in R?

MWE

library(ggraph)
library(igraph)

hs <- highschool[highschool$year == '1958',]
set.seed(11)
graph <- graph_from_data_frame(hs[sample.int(nrow(hs), 60),])


# plot using ggraph
ggraph(graph, layout = 'kk') + 
    geom_edge_fan() + 
    geom_node_text(aes(label = name)) 

enter image description here

Desired Output

The desired output would be the following green subgraph (Or close, I'm eyeballing the graph above and visually picking out what would be the subgraph) ignoring/removing the other nodes and edges.

enter image description here

2

There are 2 answers

0
nJGL On

You need a matrix of shortest paths to then create a sub-graph using a union of all edges belonging to those paths.

Let key vertices be those vertices between which your desired sub-graph appears. You say you have three such key vertices.

Consider that the shortest path between any i and j of them is unlist(shortest_paths(g, i, j, mode="all", weights=NULL)$vpath). You'd want to list all i-j combinations (1-2, 1-3, 2-3 in your case) of your key-verticies, and then list all vertices that appear on the paths between them. Sometime, surely, the same vertices appear on the shortest paths of more than one of your ij-pairs (See betweenness centrality). Your desired subgraph should include only these vertices, which you can give to induced_subgraph().

Then arises another interesting problem. Not all edges between your choosen vertices are part of your shortest paths. I'm not sure about what you desire in your sub-graph, but I assume that you only want vertices and edges that are part of shortest paths. The manual for induced_subgraph() says that eids can be provided to filter sub-graphs on edges too, but I didn't get that to work. Comments on that are welcome if anyone cracks it. To create a subgraph with only edges and vertices actually in your shortest path, some surplus edges must be deleted.

Below is an example where some key verticies are chosen at random, the surplus-edge problem of subgraphs is visualized, and a proper shortert-paths-only subgraph is generated:

enter image description here

library(igraph)
N <- 40 # Number of vertices in a random network
E <- 70 # Number of edges in a random network
K <- 5  # Number of KEY vertices between which we are to calculate the
        # shortest paths and extract a sub-graph.

# Make a random network
g <- erdos.renyi.game(N, E, type="gnm", directed = FALSE, loops = FALSE)
V(g)$label <- NA
V(g)$color <- "white"
V(g)$size <- 8
E(g)$color <- "gray"

# Choose some random verteces and mark them as KEY vertices
key_vertices <- sample(1:N, 5)
g <- g %>% set_vertex_attr("color", index=key_vertices, value="red")
g <- g %>% set_vertex_attr("size", index=key_vertices, value=12)

# Find shortest paths between two vertices in vector x:
get_path <- function(x){
  # Get atomic vector of two key verteces and return their shortest path as vector.
  i <- x[1]; j <- x[2]
  # Check distance to see if any verticy is outside component. No possible
  # connection will return infinate distance:
  if(distances(g,i,j) == Inf){
    path <- c()
  } else {
    path <- unlist(shortest_paths(g, i, j, mode="all", weights=NULL)$vpath)
  }
}

# List pairs of key vertices between which we need the shortest path
key_el <- expand.grid(key_vertices, key_vertices)
key_el <- key_el[key_el$Var1 != key_el$Var2,]

# Get all shortest paths between each pair of key_vertices:
paths <- apply(key_el, 1, get_path)

# These are the vertices BETWEEN key vertices - ON the shortest paths between them:
path_vertices <- setdiff(unique(unlist(paths)), key_vertices)
g <- g %>% set_vertex_attr("color", index=path_vertices, value="gray")


# Mark all edges of a shortest path
mark_edges <- function(path, edges=c()){
  # Get a vector of id:s of connected vertices, find edge-id:s of all edges between them.
  for(n in 1:(length(path)-1)){
    i <- path[n]
    j <- path[1+n]
    edge <- get.edge.ids(g, c(i,j), directed = TRUE, error=FALSE, multi=FALSE)
    edges <- c(edges, edge)
  }
  # Return all edges in this path
  (edges)
}

# Find all edges that are part of the shortest paths between key vertices
key_edges <- lapply(paths, function(x) if(length(x) > 1){mark_edges(x)})
key_edges <- unique(unlist(key_edges))
g <- g %>% set_edge_attr("color", index=key_edges, value="green")

# This now shoes the full graph and the sub-graph which will be created
plot(g)

# Create sub-graph:
sg_vertices <- sort(union(key_vertices, path_vertices))
unclean_sg <- induced_subgraph(g, sg_vertices)
# Note that it is essential to provide both a verticy AND an edge-index for the
# subgraph since edges between included vertices do not have to be part of the
# calculated shortest path. I never used it before, but eids=key_edges given
# to induced_subgraph() should work (even though it didn't for me just now).

# See the problem here:
plot(unclean_sg)

# Kill edges of the sub-graph that were not part of shortest paths of the mother
# graph:
sg <- delete.edges(unclean_sg, which(E(unclean_sg)$color=="gray"))

# Plot a comparison:
l <-layout.auto(g)
layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE))
plot(g, layout=l)
plot(unclean_sg, layout=l[sg_vertices,])  # cut l to keep same layout in subgraph
plot(sg, layout=l[sg_vertices,])          # cut l to keep same layout in subgraph
0
Vineet Jain On

You can't find the shortest path between n nodes. Since the shortest path is defined only between two nodes.

I think you want shortest path from 1 node to other n-1 node you can use
get_all_shortest_paths(v, to=None, mode=ALL) from igraph library.

  • v - the source for the calculated paths
  • to - a vertex selector describing the destination for the calculated paths. This can be a single vertex ID, a list of vertex IDs, a single vertex name, a list of vertex names. None means all the vertices.
  • mode - the directionality of the paths. IN means to calculate incoming paths, OUT mean to calculate outgoing paths, ALL means to calculate both ones.

Returns: all of the shortest path from the given node to every other reachable node in the graph in a list.
get_all_shortest_paths

So, now you have to create a graph from a list of the shortest paths.

  1. Initialize an empty graph then add all path to it from the list of the path
    adding path in graph

    OR

  2. make a graph for every shortest path found and take graphs union.
    union igraph