I'm trying to understand how to pass in future_map
a list of character vectors in .x
that is evaluated by .f
. In the code below, I feed bestVars
(a list of variable vectors) to .x
which in turn passes each element of the list to a custom function, run_sim_in_par
. The custom function then uses mget(.x)
to get the values of each variable in .x
from the globals and then Reduce
s these values and finally performs a few other steps.
When I attempt to run the run_sims_in_par
function in multisession I keep getting:
Error in (function (.x, .f, ..., .progress = FALSE) : ℹ In index:
- Caused by error: ! value for 'a1' not found
After reading several questions and other sources on this error, I understand that it's not possible to automatically identify global variables specified via a character string (https://cran.r-project.org/web/packages/future/vignettes/future-4-issues.html). In my example, what is the proper way to have future_map
get global values referenced in .x
using a list of character vectors? I've been unsuccessful in the many different ways I've set globals
and options
arguments.
The workaround suggested in the future vignette link above recommends the following:
The workaround is to tell the future framework what additional globals are needed. This can be done via argument globals using:
> f <- future(my_sum("a"), globals = structure(TRUE, add = "a"))
> y <- value(f)
> y
[1] 6
or by injecting variable a at the beginning of the future expression, e.g.
> f <- future({ a; my_sum("a") })
> y <- value(f)
> y
[1] 6
But I'm having a hard time understanding how to modify my code based on the suggested action.
I'm sure this will come up so I'll pre-emptively mention it: the reason I'm assigning the df
values to my global environment is because I'm trying to lower the size of the globals exported by future
as this is significantly slowing the code when running multisession on remote AWS clusters.
library(future)
library(furrr)
library(kit)
library(tidyverse)
## reprex data
vars <- paste0(letters,1:10)
bestVars <- combn(vars, 5, simplify = F)
df <- data.frame(
matrix(data = rnorm(50000*length(vars),200,500), nrow = 50000, ncol = length(vars))
)
names(df) <- vars
df$value <- rnorm(n = nrow(df), 350, 300)
df <- df %>%
dplyr::select(value,everything(.))
df <- lapply(split.default(x = df, names(df)), function(x) x[[1]])
list2env(df, globalenv())
rm(df)
run_sim_in_par <- function(vars_to_sim)
{
sampled_rows <- sample(x = 1:length(value), size = 50, replace = F)
varname <- paste(names(vars_to_sim), collapse = "*")
best <- Reduce(vars_to_sim, f = '*')[sampled_rows]
row_idx <- kit::topn(best, n = 5, decreasing = T, hasna = FALSE, index = TRUE)
best_row_value <- value[sampled_rows][row_idx]
sim <- data.frame(var = varname,
mean_value = mean(best_row_value))
return(sim)
}
## working when explicitly declaring .x
x <- bestVars[[1]]
simulated_res <- run_sim_in_par(vars_to_sim = mget(x))
## not recognizing .x
simulated_res <- future_map_dfr(
.x = bestVars,
.f = ~run_sim_in_par(vars_to_sim = mget(.x))
)
# Error in (function (.x, .f, ..., .progress = FALSE) :
# ℹ In index: 1.
# Caused by error:
# ! value for 'a1' not found
## same erro when setting furr_options for 'globals'
simulated_res <- future_map_dfr(
.x = bestVars,
.f = ~run_sim_in_par(vars_to_sim = mget(.x)),
.options = furrr_options(globals = TRUE)
)
## attempt at declaring all globals for just the first element of bestVars
simulated_res <- future_map_dfr(
.x = bestVars[[1]],
.f = ~run_sim_in_par(vars_to_sim = mget(.x)),
.options = furrr_options(globals = c(bestVars[[1]], "run_sim_in_par", "value"))
)
# Error in (function (.x, .f, ..., .progress = FALSE) :
# ℹ In index: 1.
# Caused by error:
# ! value for 'a1' not found