Multiplyr and prophet for parallel grouped prediction: Error in checkForRemoteErrors(lapply(cl, recvResult))

529 views Asked by At

I am willing to make parallel predictions using multidplyr and prophet. Consider the following data

library(tidyr)
library(dplyr)
library(multidplyr)
library(prophet)

ds = as.Date(c('2016-11-01', '2016-11-02', '2016-11-03', '2016-11-04', 
            '2016-11-05', '2016-11-06', '2016-11-07', '2016-11-08', 
            '2016-11-09', '2016-11-10', '2016-11-11', '2016-11-12', 
            '2016-11-13', '2016-11-14', '2016-11-15', '2016-11-16', 
            '2016-11-17', '2016-11-18', '2016-11-19', '2016-11-20', 
            '2016-11-21', '2016-11-22', '2016-11-23', '2016-11-24', 
            '2016-11-25', '2016-11-26', '2016-11-27', '2016-11-28', 
            '2016-11-29', '2016-11-30', '2016-11-01', '2016-11-02', 
            '2016-11-03', '2016-11-04', '2016-11-05', '2016-11-06', 
            '2016-11-07', '2016-11-08', '2016-11-09', '2016-11-10', 
            '2016-11-11', '2016-11-12', '2016-11-13', '2016-11-14', 
            '2016-11-15', '2016-11-16', '2016-11-17', '2016-11-18', 
            '2016-11-19', '2016-11-20', '2016-11-21', '2016-11-22', 
            '2016-11-23', '2016-11-24', '2016-11-25', '2016-11-26', 
            '2016-11-27', '2016-11-28', '2016-11-29', '2016-11-30'))

y = c(15, 17, 18, 19, 20, 54, 67, 23, 12, 34, 12, 78, 34, 12, 3, 45, 67, 89, 12, 111, 123, 112, 14, 566, 345, 123, 567, 56, 87, 90, 45, 23, 12, 10, 21, 34, 12, 45, 12, 44, 87, 45, 32, 67, 1, 57, 87, 99, 33, 234, 456, 123, 89, 333, 411, 232, 455, 55, 90, 21)

group = c("A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
          "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
          "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", 
          "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B")

df = data.frame(ds, group, y)

Although I am able to make per-group sequential predictions using

df %>%  
  group_by(group) %>%
  do(predict(prophet::prophet(.), prophet::make_future_dataframe(prophet::prophet(.), periods = 7)))

I am not being able to parallelize it. I have tried so far with the partition and collect commands as suggested here

multidplyr::cluster_library(cluster, "prophet")

df %>%
  partition(group) %>%
  do(predict(prophet::prophet(.), prophet::make_future_dataframe(prophet::prophet(.), periods = 7))) %>%
  collect()

Which gives me an error

Error in checkForRemoteErrors(lapply(cl, recvResult)) : 
  2 nodes produced errors; first error: 'data' must be of a vector type, was 'NULL'
In addition: Warning message:
group_indices_.grouped_df ignores extra arguments

Or like following

multidplyr::cluster_library(cluster, "purrr")
multidplyr::cluster_library(cluster, "prophet")

df %>%
  partition(group) %>%
  mutate(m = purrr::map(data, prophet::prophet)) %>% 
  mutate(future = purrr::map(m, prophet::make_future_dataframe, period = 7)) %>% 
  mutate(forecast = purrr::map2(m, future, predict)) %>%
  collect()

Which gives me the following error

Error in checkForRemoteErrors(lapply(cl, recvResult)) : 
  2 nodes produced errors; first error: Evaluation error: `.x` is not a vector (closure).
In addition: Warning message:
group_indices_.grouped_df ignores extra arguments

Thus, I am lost on how to proceed. Any suggestion is more than welcome. Thank you in advance.

Ps.: Here is my sessionInfo()

> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux buster/sid

locale:
 [1] LC_CTYPE=C                 LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] purrr_0.2.2.2         bindrcpp_0.2          prophet_0.1.1
[4] Rcpp_0.12.12          multidplyr_0.0.0.9000 dplyr_0.7.2
[7] tidyr_0.6.3

loaded via a namespace (and not attached):
 [1] bindr_0.1            magrittr_1.5         munsell_0.4.3
 [4] lattice_0.20-35      colorspace_1.3-2     R6_2.2.2
 [7] rlang_0.1.1          extraDistr_1.8.6     plyr_1.8.4
[10] tools_3.3.3          parallel_3.3.3       grid_3.3.3
[13] gtable_0.2.0         StanHeaders_2.16.0-1 lazyeval_0.2.0
[16] assertthat_0.2.0     tibble_1.3.3         rstan_2.16.2
[19] gridExtra_2.2.1      ggplot2_2.2.1        codetools_0.2-15
[22] inline_0.3.14        glue_1.1.1           stringi_1.1.5
[25] scales_0.4.1         stats4_3.3.3         pkgconfig_2.0.1
[28] zoo_1.8-0
0

There are 0 answers