Parallel Processing with r5r

74 views Asked by At

I'm currently testing out the r5r package, which describes its processes as using parallel computation by default. Given the large number of origin-destination points I want to analyze quickly using its detailed_itineraries function, I wanted to test to see whether it could be sped up at all using any of the other parallelization tools in R.

I am trying using this code:

library(r5r)
library(sf)
library(tigris)
library(future.apply)
library(rJava)
library(tidyverse)

r5r_core <- setup_r5(data_path = path, verbose = FALSE)

tracts2 <- tracts(state = "PA", county = "Philadelphia", year=2019)%>%
  select(GEOID)%>%
  st_centroid()%>%
  st_transform("EPSG:4326")%>%
  arrange(GEOID)%>%
  rename(id = GEOID)%>%
  mutate(lon = unlist(map(geometry,1)),
         lat = unlist(map(geometry,2)))%>%
  st_set_geometry(NULL)%>%
  as.data.frame()

mode <- c("WALK", "TRANSIT")
max_walk_time <- 30 # minutes
departure_datetime <- as.POSIXct("14-06-2023 8:30:00",
                                 format = "%d-%m-%Y %H:%M:%S")

plan(multicore)

fn <- function(x, y){
  detailed_itineraries(r5r_core = r5r_core,
                       origins = x,
                       destinations = y,
                       mode = mode,
                       departure_datetime = departure_datetime,
                       max_walk_time = max_walk_time,
                       walk_speed = 4.5,
                       max_trip_duration = 60,
                       shortest_path = TRUE,
                       all_to_all = FALSE,
                       drop_geometry = TRUE,
                       progress= TRUE)
}

future_mapply(fn, tracts2, tracts2)

And am getting this error:

Error in assign_points_input(origins, "origins") : 
  'origins' must be either a 'data.frame' or a 'POINT sf'.

What is going wrong here? Alternatively, am I barking up the wrong tree trying to gain any speed this way?

1

There are 1 answers

0
dhersz On BEST ANSWER

{r5r} functions cannot be sped up using parallelization from R. We've tested it a few times already, and any parallelization we tried doing in R was less efficient than the current parallelization implemented in Java. To control the number of threads used when routing, please use the n_threads parameter.

PS: The error you're seeing results from incorrect future_mapply() usage. Under the hood, a data.frame is a list with some additional attributes and methods, so when you pass a data.frame to future_mapply() the function iterates to each one of its columns. Effectively, what you're doing with your code is passing tracts2's columns to detailed_itineraries(), not tracts2 itself.