I am running a parallel computation using furrr
in R. The computation require access to a web API and an authentication needs to take place. If I run a parallel process, each process needs to authenticate.
In the below, I have 6 processes. So I would need to authenticate on these six processes first then run the calculations. I don't know how to do that using furrr
. So I end up doing an authentication in each run, which is really inefficient.
Below is a simple example for illustrative purposes. It does not work because I can't share the api.configure
function, but hopefully you get the idea.
Thanks
library(tidyverse)
library(furrr)
plan(multiprocess, workers = 6)
testdf = starwars %>%
select(-films, -vehicles, -starships) %>%
future_pmap_dfr(.f = function(...){
api.configure(username = "username", password = "password")
currentrow = tibble(...)
l = tibble(name = currentrow$name, height = currentrow$height)
return(l)
})
The way to solve this was to ask the dev of the API to add variable in the API package that tests whether the connection is open or not. this way I authenticate once on each of the
future
processes, if the connection is not open, and once this is done, all subsequent API authentication calls to that process will be halted by the if clause.