Performance Issue with Computing Travel Time Matrix using r5r package and large pbf file

125 views Asked by At

I am ecountering a performance issue while using the r5r package to compute a travel time matrx.

I want to compute a travel time matrix from approximately 20,000 Belgian neighborhoods to around 250 Belgian hospitals. To achieve this, I downloaded the Belgium pbf file from geofabrik (approximately 558 MB in size). Subsequently, I used the setup_r5() function to create a network.dat file (size is 159 MB). While creating this file, I received some warning messages, although no errors occurred.

Upon attempting to compute the travel time matrix, I noticed that the process was exceptionally slow and time-consuming. To troubleshoot, I narrowed down my sample to Brussels neighborhoods (724 points) and Belgian hospitals. Despite this reduction, the computation still took over 2 hours to complete. This duration seems significantly slower than the on reported time in the r5r paper for a 1227*1227 matrix (less than 1 minute, as reported here: https://findingspress.org/article/21262-r5r-rapid-realistic-routing-on-multimodal-transport-networks-with-r-5-in-r). I am running my code on a Windows machine with Intel(R) i7-8750H CPU @ 2.20GHz and 16,0 GB RAM.

Could the sluggish performance be dued to the size of the network.dat file, or is there another factor that might be causing this delay?

Reproducible example here

# insert reproducible example here

options(java.parameters = '-Xmx6G')

library(sf)
library(r5r)
library(tidyverse)
# build transport network

r5r_core <- setup_r5("D:\\accessibility\\data", verbose = FALSE)

# load origin/destination points
origins_i <- load(origins.Rdata)
destinations_j <- load(destination.Rdata)

# build ttm

ttm_chunk <- travel_time_matrix(
  r5r_core = r5r_core,
  origins = origins_i,
  destinations = destinations_j,
  mode = "CAR",
  departure_datetime = as.POSIXct("13-05-2019 14:00:00",
                                 format = "%d-%m-%Y %H:%M:%S"),
  max_trip_duration = 60,
  verbose = FALSE,
  progress = FALSE)

#2023-10-30 10:37:41,990 [ForkJoinPool.commonPool-worker-3] ERROR c.c.r.t.TransportNetwork - #TransportNetwork transit layer is loaded but timezone is unknown; API request times will be #interpreted as GMT.
#2023-10-30 10:37:48,112 [ForkJoinPool.commonPool-worker-23] ERROR c.c.r.t.TransportNetwork - #TransportNetwork transit layer is loaded but timezone is unknown; API request times will be #interpreted as GMT.
#2023-10-30 10:37:56,485 [ForkJoinPool.commonPool-worker-13] ERROR c.c.r.t.TransportNetwork - #TransportNetwork transit layer is loaded but timezone is unknown; API request times will be #interpreted as GMT.
#2023-10-30 10:37:56,516 [ForkJoinPool.commonPool-worker-19] ERROR c.c.r.t.TransportNetwork - #TransportNetwork transit layer is loaded but timezone is unknown; API request times will be #interpreted as GMT.
#2023-10-30 10:38:07,900 [ForkJoinPool.commonPool-worker-27] ERROR c.c.r.t.TransportNetwork - #TransportNetwork transit layer is loaded but timezone is unknown; API request times will be #interpreted as GMT.
end.time <- Sys.time()

1

There are 1 answers

1
dhersz On

It's hard to pinpoint exactly what might be the cause of the function running slowly.

A larger PBF will result in a larger transport network, which in turn may result in a larger pool of itinerary options between origins and destinations. This may make the computation process slower, but it's hard to say how much slower.

One thing that attracts my attention in your example is that you're calculating trips using mode = "CAR", while the paper is using c("WALK", "TRANSIT"). Car trips are much faster and can travel much further than walk and transit trips. Consequently, the pool of itinerary options between origins and destinations is much larger when considering car trips than when considering walk/transit trips.

I suspect that the root cause of the slow computation here is the transport mode, since you're considering a fairly large area in which the sample of feasible transit trips is much much smaller than the sample of feasible car trips. Try, for example, using mode = "WALK" to see if this speeds up the process.

If this is the case, I don't think there is any solution to your problem, as the slow computation is caused by the sheer size of your datasets.