Pass a list of variables a parameter to a custom function using map()

51 views Asked by At

I have a custom function that:

  1. Takes a list of data frames and pases a string
  2. Maeks some data transformation and
  3. Returns a data frame with the transformed data

Reproducible example:

# I have 4 dataframes in a listy with a few observations that repeat themselves
df_1 <- data.frame(col1 = c(1, 2, 3, 4), col2 = c('apple', 'pineapple', 'orange', 'grape'))
df_2 <- data.frame(col1 = c(2, 3, 4, 5, 6, 7), col2 = c('watermelon', 'orange', 'halibut', 'apple', 'iron', 'grape'))
df_3 <- data.frame(col1 = c(2, 3, 4, 5, 6, 7, 9, 0), col2 = c('rock', 'pineapple', 'apple', 'tire', 'bomb', 'star', 'coconut', 'grape'))
df_4 <- data.frame(col1 = c(1, 4, 9), col2 = c('grape', 'apple', 'rock'))

# All inside a another list
df_list <- list(df_1, df_2, df_3, df_4)

# now we use a function where
toy_function <- function(df_list, var1) {
  map(df_list, ~.x %>% filter (col2 == var1) %>% mutate(result = col1 * 2)) %>% 
    bind_rows() %>% 
    select(result)
}

# Solution from toy_function()
toy_function(df_list = df_list, var1 = 'apple')

Now, what I want to do is to pass a vector of strings to toy_function() as follows:

# List of strings to pass to toy_function()
list_of_fruits <- c('apple', 'grape')

# This is where it all goes wrong
map2(.x = df_list, .y = list_of_fruits, .f = toy_function)

# Error
Error in `map2()`:
! Can't recycle `.x` (size 4) to match `.y` (size 2).
Run `rlang::last_trace()` to see where the error occurred.

The result I want to get from the function is:

map2(.x = df_list, .y = list_of_fruits, .f = toy_function)

# Expected results

[[1]]
  result
1 2
2 10
3 8
4 8

[[2]]
  result
1 8
2 14
3 0
4 2

EDIT

As pointed out in the comments, toy_function() should be modified to catch all the variables:

toy_function <- function(df_list, var1) {
  map(df_list, ~ { 
        filtered_df <- .x %>% filter(col2 %in% var1) 
        filtered_df %>% mutate(result = col1 * 2) %>% select(result)
}) %>% 
  bind_rows()
}

But still got this error

> map2(.x = df_list, .y = list_of_fruits, .f = toy_function)
Error in `map2()`:
! Can't recycle `.x` (size 4) to match `.y` (size 2).
Run `rlang::last_trace()` to see where the error occurred.
````
1

There are 1 answers

0
Gregor Thomas On BEST ANSWER

map2 expects arguments of the same length and it will iterate over them "in parallel", it works like this:

## this map2 call
map2(.x = df_list, .y = list_of_fruits, .f = toy_function)
## is equivalent to this:
list(
  toy_function(df_list[[1]], list_of_fruits[[1]]), 
  toy_function(df_list[[2]], list_of_fruits[[2]]), 
  toy_function(df_list[[3]], list_of_fruits[[3]]),
  ...
)

Notice how both the df_list and the list_of_fruits are iterated at the same time.

You don't want that. You've written toy_function so that it already expects a list as it's first argument and it uses map internally to iterate over it. You don't need another wrapper to iterate over df_list. You only need to iterate over 1 object, your list of fruits.

map(list_of_fruits, \(fruit) toy_function(df_list, fruit))
# [[1]]
#   result
# 1      2
# 2     10
# 3      8
# 4      8
# 
# [[2]]
#   result
# 1      8
# 2     14
# 3      0
# 4      2

map2 is a good choice when you want iteration in parallel, that is a[1] and b[1], then a[2] and b[2], then a[3] and b[3], etc. You don't want that here, you want every combination, df_list[1] and list_of_fruits[1], df_list[2] and list_of_fruits[1], df_list[1] and list_of_fruits[2], df_list[2] and list_of_fruits[2]. For that you need to use nested maps/loops.