R furrr: authenticate an API on each future process before running the computation

188 views Asked by At

I am running a parallel computation using furrr in R. The computation require access to a web API and an authentication needs to take place. If I run a parallel process, each process needs to authenticate. In the below, I have 6 processes. So I would need to authenticate on these six processes first then run the calculations. I don't know how to do that using furrr. So I end up doing an authentication in each run, which is really inefficient.

Below is a simple example for illustrative purposes. It does not work because I can't share the api.configure function, but hopefully you get the idea.

Thanks

library(tidyverse)
library(furrr)
plan(multiprocess, workers = 6)

testdf =  starwars %>%
  select(-films, -vehicles, -starships) %>%
  future_pmap_dfr(.f = function(...){
    api.configure(username = "username", password = "password")
    currentrow = tibble(...)
    l = tibble(name = currentrow$name, height = currentrow$height)
    return(l)
})
2

There are 2 answers

0
Courvoisier On BEST ANSWER

The way to solve this was to ask the dev of the API to add variable in the API package that tests whether the connection is open or not. this way I authenticate once on each of the future processes, if the connection is not open, and once this is done, all subsequent API authentication calls to that process will be halted by the if clause.

3
Waldi On

Try to open the connexion before the map:

library(tidyverse)
library(furrr)
plan(multiprocess, workers = 6)

future_options(globals = T) # this should be the default
api.configure(username = "username", password = "password")
ls(all=TRUE) #Check if new environment variables are available to save connexion

testdf =  starwars %>%
  select(-films, -vehicles, -starships) %>%
  future_pmap_dfr(.f = function(...){
    
    currentrow = tibble(...)
    l = tibble(name = currentrow$name, height = currentrow$height)
    return(l)
})