How to convert a list of sf spatial points into a routable graph

901 views Asked by At

I have an sf dataframe object with a series of points representing the shape of a bus route. I would like to turn this object into a routable graph so I can estimate the time it takes to traverse from point c to t.

Here is what I've tried using the dodgr package but I am not sure what I'm doing wrong here:

library(dodgr)
graph <- weight_streetnet(mydata, wt_profile = "motorcar", type_col="highway" , id_col = "id")

Error in check_highway_osmid(x, wt_profile) : Please specify type_col to be used for weighting streetnet

Reproducible data

The data looks like in the image below

mydata <- structure(list(shape_id = c(52421L, 52421L, 52421L, 52421L, 52421L, 
                              52421L, 52421L, 52421L, 52421L, 52421L, 52421L, 52421L, 52421L, 
                              52421L, 52421L, 52421L, 52421L, 52421L, 52421L, 52421L), length = structure(c(0.191422504106197, 
                              0.191422504106197, 0.191422504106197, 0.191422504106197, 0.191422504106197, 
                              0.191422504106197, 0.191422504106197, 0.191422504106197, 0.191422504106197, 
                              0.191422504106197, 0.191422504106197, 0.191422504106197, 0.191422504106197, 
                              0.191422504106197, 0.191422504106197, 0.191422504106197, 0.191422504106197, 
                              0.191422504106197, 0.191422504106197, 0.191422504106197), units = structure(list(
                              numerator = "km", denominator = character(0)), class = "symbolic_units"), class = "units"), 
                              geometry = structure(list(structure(c(-46.5623281998182, 
                              -23.5213458001468), class = c("XY", "POINT", "sfg")), structure(c(-46.562221, 
                              -23.52129), class = c("XY", "POINT", "sfg")), structure(c(-46.562121, 
                              -23.521235), class = c("XY", "POINT", "sfg")), structure(c(-46.5620233332577, 
                              -23.5211840000609), class = c("XY", "POINT", "sfg")), structure(c(-46.561925666591, 
                              -23.5211330000609), class = c("XY", "POINT", "sfg")), structure(c(-46.561828, 
                              -23.521082), class = c("XY", "POINT", "sfg")), structure(c(-46.5618098335317, 
                              -23.5212126666783), class = c("XY", "POINT", "sfg")), structure(c(-46.5617916670273, 
                              -23.5213433333544), class = c("XY", "POINT", "sfg")), structure(c(-46.5617735004869, 
                              -23.5214740000284), class = c("XY", "POINT", "sfg")), structure(c(-46.5617553339104, 
                              -23.5216046667004), class = c("XY", "POINT", "sfg")), structure(c(-46.5617371672978, 
                              -23.5217353333702), class = c("XY", "POINT", "sfg")), structure(c(-46.5617190006492, 
                              -23.5218660000379), class = c("XY", "POINT", "sfg")), structure(c(-46.5617008339645, 
                              -23.5219966667036), class = c("XY", "POINT", "sfg")), structure(c(-46.5616826672438, 
                              -23.5221273333671), class = c("XY", "POINT", "sfg")), structure(c(-46.5616645004869, 
                              -23.5222580000284), class = c("XY", "POINT", "sfg")), structure(c(-46.5616463336941, 
                              -23.5223886666877), class = c("XY", "POINT", "sfg")), structure(c(-46.5616281668651, 
                              -23.5225193333449), class = c("XY", "POINT", "sfg")), structure(c(-46.56161, 
                              -23.52265), class = c("XY", "POINT", "sfg")), structure(c(-46.5617355000207, 
                              -23.5226427501509), class = c("XY", "POINT", "sfg")), structure(c(-46.5618610000276, 
                              -23.5226355002012), class = c("XY", "POINT", "sfg"))), class = c("sfc_POINT", 
                              "sfc"), precision = 0, bbox = structure(c(xmin = -46.5623281998182, 
                              ymin = -23.52265, xmax = -46.56161, ymax = -23.521082), class = "bbox"), crs = structure(list(
                              epsg = 4326L, proj4string = "+proj=longlat +datum=WGS84 +no_defs"), class = "crs"), n_empty = 0L), 
                              id = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", 
                              "k", "l", "m", "n", "o", "p", "q", "r", "s", "t"), speed_kmh = c(11, 
                              11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 
                              11, 11, 11, 11)), sf_column = "geometry", agr = structure(c(shape_id = NA_integer_, 
                              length = NA_integer_, id = NA_integer_, speed_kmh = NA_integer_
                              ), class = "factor", .Label = c("constant", "aggregate", "identity"
                              )), row.names = c("1.13", "1.14", "1.15", "1.16", "1.17", "1.18", 
                              "1.19", "1.20", "1.21", "1.22", "1.23", "1.24", "1.25", "1.26", 
                              "1.27", "1.28", "1.29", "1.30", "1.31", "1.32"), class = c("sf", 
                              "data.table", "data.frame"))

enter image description here

3

There are 3 answers

0
mpadge On BEST ANSWER

The weight_streetnet function is really only designed to handle actual street networks, generally as produced by the osmdata::osmdata_sf/sp/sc() functions. It can nevertheless be tweaked to handle cases like this. The main thing needed is to convert the points into something that knows about edges in between them, like an sf::LINESTRING object:

x <- sf::st_combine (mydata) %>%
    sf::st_cast ("LINESTRING") %>%
    sf::st_sf ()

That gives a single-row object which can then be converted to dodgr format, and the id values matched back on to the edges

net <- weight_streetnet (x, type_col = "shape_id", id_col = "id", wt_profile = 1)
net$from_id <- mydata$id [as.integer (net$from_id)]
net$to_id <- mydata$id [as.integer (net$to_id)]

At that point, dodgr will have calculated and inserted the distances directly from the geographic coordinates. Your distances can then also be inserted and used for routing by replacing the d_weighted values:

net$d_weighted <- as.numeric (mydata$length [1])
dodgr_dists (net, from = "c", to = "t") # 236.0481

If you really want your distances to represent absolute distances used to calculate the final result, then simply replace the $d values

net$d <- net$d_weighted
dodgr_dists (net, from = "c", to = "t") # 3.254183

Note that for "simple" problems like this, igraph will generally be faster, because it calculates routes using a single set of weights. The only real advantage of dodgr in this context is the ability to use "dual weights" - the $d_weighted and $d values - such that the route is calculated according to $d_weighted, and the final distances according to $d.

2
Orlando Sabogal On

I think you can solve it by transforming your data into an igraph object and use the functionalities in the igraph library. You need to establish the Edges and Vertex as well as weight values. In igraph an Edge is a link representing a connection among two nodes (Source and Target). In this case, a link is a "street" and the points are the nodes.

library(igraph)
GraphResult <- data.frame(Source = c(NULL), 
                      Target = c(NULL), 
                      weight  = c(NULL))

for (i in 1:(dim(mydata)[1] - 1)) {

  TempGraphResult <- data.frame(Source = c(0), 
                                Target = c(0), 
                                weight  = c(0))

  TempGraphResult$Source[1] <- mydata$id[i]
  TempGraphResult$Target[1] <- mydata$id[i + 1]
  TempGraphResult$weight[1] <- mydata$length[i]

  GraphResult <- rbind(GraphResult, TempGraphResult) }

MyIgraph <- graph_from_data_frame(GraphResult) 

#In this case works perfectly. But if you have more weight variables and even
#additional variables for the nodes, igraph have functions for constructing the
#igraph object

distances(MyIgraph, "c", "t") #returns 3.254183. Seems correct (0.1914225*17)
SquareMatrix <- distances(MyIgraph)

#*distances() is a function from igraph that performs the routing calculations.

Is possible to achieve more complex networks and calculate routes. For example, you can set the direction of the roads.

Maybe dodger can handle the problem, but I am not sure.

0
luukvdmeer On

If you want to include this in a 'tidy' workflow, you could also consider using a mix between sf and tidygraph. The latter offers a tidy framework for networks/graphs, in the form of a tbl_graph class, which subclasses igraph (hence, you can use tbl_graph objects inside all igraph functions as being an igraph object). However, you can analyze your nodes and edges as being tibbles, and use functions as filter(), select(), mutate(), et cetera. Of course, these tibbles can also contain a geometry list column that we know from sf, adding geographic information to the nodes and edges.

The approach is far from perfect yet, and improvements would be very welcome, but still, it shows another way of handling the problem.

# Load libraries.
library(tidyverse)
library(sf)
library(tidygraph)
library(igraph)
library(units)

Just as in the other answers, we need to create edges between the nodes. For now, I assume that the points are simply connected in alphabetical order. For the tidygraph approach, however, we seem to need numeric ID's rather than characters.

# Add a numeric ID column to the nodes.
nodes <- mydata %>%
    rename(id_chr = id) %>%
    rowid_to_column("id") %>%
    select(id, id_chr, everything())

# Define the source node of each edge, and the target node of each edge.
sources <- nodes %>% slice(-n())
targets <- nodes %>% slice(-1)

# Write a function to create lines between data frames of source and target points.
pt2l <- function(x, y) { st_linestring(rbind(st_coordinates(x), st_coordinates(y))) }

# Create the edges.
edges <- tibble(
        from = sources %>% pull(id), 
        to = targets %>% pull(id), 
        length = sources %>% pull(length), 
        speed = sources %>% pull(speed_kmh),
        geometry = map2(st_geometry(sources), st_geometry(targets), pt2l)
    ) %>% st_as_sf() %>% st_set_crs(st_crs(nodes))

# Add a time column to the edges.
edges <- edges %>%
    mutate(speed = set_units(speed, "km/h")) %>%
    mutate(time = length / speed)

# Clean up the nodes data.
nodes <- nodes %>%
    select(-length, -speed_kmh)

# Create the tbl_graph object out of the nodes and edges.
# Providing the edges as sf object is problematic for tidygraph, unfortunately.
# Therefore, we have to provide them as a tibble.
graph <- tbl_graph(nodes = nodes, edges = as_tibble(edges), directed = FALSE)

This gives us the following tbl_graph object:

# A tbl_graph: 20 nodes and 19 edges
#
# An undirected simple graph with 1 component
#
# Node Data: 20 x 4 (active)
     id id_chr shape_id              geometry
  <int> <chr>     <int>           <POINT [°]>
1     1 a         52421 (-46.56233 -23.52135)
2     2 b         52421 (-46.56222 -23.52129)
3     3 c         52421 (-46.56212 -23.52124)
4     4 d         52421 (-46.56202 -23.52118)
5     5 e         52421 (-46.56193 -23.52113)
6     6 f         52421 (-46.56183 -23.52108)
# … with 14 more rows
#
# Edge Data: 19 x 6
   from    to    length   speed                               geometry      time
  <int> <int>      [km]  [km/h]                       <LINESTRING [°]>       [h]
1     1     2 0.1914225      11 (-46.56233 -23.52135, -46.56222 -23.5… 0.017402…
2     2     3 0.1914225      11 (-46.56222 -23.52129, -46.56212 -23.5… 0.017402…
3     3     4 0.1914225      11 (-46.56212 -23.52124, -46.56202 -23.5… 0.017402…
# … with 16 more rows

Now we have everything in a graph structure, we can select the node we want to route from, and the node we want to route to, and find the shortest path between them with the travel time as weight variable, using the shortest_path function from igraph. We now just work with a one-to-one route ('c' to 't'), but it would be just the same for one-to-many, many-to-one or many-to-many.

# Select the node from which and to which the shortest path should be found.
from_node <- graph %>%
  activate(nodes) %>%
  filter(id_chr == "c") %>%
  pull(id)

to_node <- graph %>%
  activate(nodes) %>%
  filter(id_chr == "t") %>%
  pull(id)

# Find the shortest path between these nodes
path <- shortest_paths(
  graph = graph,
  from = from_node,
  to = to_node,
  output = 'both',
  weights = graph %>% activate(edges) %>% pull(time)
)

The resulting path is a list with the nodes and edges that make up the path.

$vpath
$vpath[[1]]
+ 18/20 vertices, from e43a089:
 [1]  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20


$epath
$epath[[1]]
+ 17/19 edges from e43a089:
 [1]  3-- 4  4-- 5  5-- 6  6-- 7  7-- 8  8-- 9  9--10 10--11 11--12 12--13
[11] 13--14 14--15 15--16 16--17 17--18 18--19 19--20

We can create a subgraph of the original graph that only contains the nodes and edges of the shortest path.

path_graph <- graph %>%
    subgraph.edges(eids = path$epath %>% unlist()) %>%
    as_tbl_graph()
# A tbl_graph: 18 nodes and 17 edges
#
# An undirected simple graph with 1 component
#
# Node Data: 18 x 4 (active)
     id id_chr shape_id              geometry
  <int> <chr>     <int>           <POINT [°]>
1     3 c         52421 (-46.56212 -23.52124)
2     4 d         52421 (-46.56202 -23.52118)
3     5 e         52421 (-46.56193 -23.52113)
4     6 f         52421 (-46.56183 -23.52108)
5     7 g         52421 (-46.56181 -23.52121)
6     8 h         52421 (-46.56179 -23.52134)
# … with 12 more rows
#
# Edge Data: 17 x 6
   from    to    length   speed                               geometry      time
  <int> <int>      [km]  [km/h]                       <LINESTRING [°]>       [h]
1     1     2 0.1914225      11 (-46.56212 -23.52124, -46.56202 -23.5… 0.017402…
2     2     3 0.1914225      11 (-46.56202 -23.52118, -46.56193 -23.5… 0.017402…
3     3     4 0.1914225      11 (-46.56193 -23.52113, -46.56183 -23.5… 0.017402…
# … with 14 more rows

Here, something happens which I don't like. Tidygraph/igraph seems to have an internal node ID structure, and you see that in the subgraph, the from and to columns in the egdes data don't match with our id column in the nodes data anymore, but instead simply refer to the row numbers of the nodes data. I am not sure how to fix that.

Either way, we now have our path from 'c' to 't' as a subgraph, and can easily analyze it. For example, by calculating the total travel time of the path (as was the question).

path_graph %>%
    activate(edges) %>%
    as_tibble() %>%
    summarise(total_time = sum(time))
# A tibble: 1 x 1
  total_time
         [h]
1  0.2958348

But it is also easy to plot it, with geographic information preserved (just by exporting nodes and edges as sf objects).

ggplot() +
  geom_sf(data = graph %>% activate(edges) %>% as_tibble() %>% st_as_sf(), col = 'darkgrey') +
  geom_sf(data = graph %>% activate(nodes) %>% as_tibble() %>% st_as_sf(), col = 'darkgrey', size = 0.5) +
  geom_sf(data = path_graph %>% activate(edges) %>% as_tibble() %>% st_as_sf(), lwd = 1, col = 'firebrick') +
  geom_sf(data = path_graph %>% activate(nodes) %>% filter(id %in% c(from_node, to_node)) %>% as_tibble() %>% st_as_sf(), size = 2)

plot

An r-spatial blog post might be coming up on this tidygraph-sf approach ;)