I am willing to make parallel predictions using multidplyr and prophet. Consider the following data
library(tidyr)
library(dplyr)
library(multidplyr)
library(prophet)
ds = as.Date(c('2016-11-01', '2016-11-02', '2016-11-03', '2016-11-04',
'2016-11-05', '2016-11-06', '2016-11-07', '2016-11-08',
'2016-11-09', '2016-11-10', '2016-11-11', '2016-11-12',
'2016-11-13', '2016-11-14', '2016-11-15', '2016-11-16',
'2016-11-17', '2016-11-18', '2016-11-19', '2016-11-20',
'2016-11-21', '2016-11-22', '2016-11-23', '2016-11-24',
'2016-11-25', '2016-11-26', '2016-11-27', '2016-11-28',
'2016-11-29', '2016-11-30', '2016-11-01', '2016-11-02',
'2016-11-03', '2016-11-04', '2016-11-05', '2016-11-06',
'2016-11-07', '2016-11-08', '2016-11-09', '2016-11-10',
'2016-11-11', '2016-11-12', '2016-11-13', '2016-11-14',
'2016-11-15', '2016-11-16', '2016-11-17', '2016-11-18',
'2016-11-19', '2016-11-20', '2016-11-21', '2016-11-22',
'2016-11-23', '2016-11-24', '2016-11-25', '2016-11-26',
'2016-11-27', '2016-11-28', '2016-11-29', '2016-11-30'))
y = c(15, 17, 18, 19, 20, 54, 67, 23, 12, 34, 12, 78, 34, 12, 3, 45, 67, 89, 12, 111, 123, 112, 14, 566, 345, 123, 567, 56, 87, 90, 45, 23, 12, 10, 21, 34, 12, 45, 12, 44, 87, 45, 32, 67, 1, 57, 87, 99, 33, 234, 456, 123, 89, 333, 411, 232, 455, 55, 90, 21)
group = c("A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B")
df = data.frame(ds, group, y)
Although I am able to make per-group sequential predictions using
df %>%
group_by(group) %>%
do(predict(prophet::prophet(.), prophet::make_future_dataframe(prophet::prophet(.), periods = 7)))
I am not being able to parallelize it. I have tried so far with the partition
and collect
commands as suggested here
multidplyr::cluster_library(cluster, "prophet")
df %>%
partition(group) %>%
do(predict(prophet::prophet(.), prophet::make_future_dataframe(prophet::prophet(.), periods = 7))) %>%
collect()
Which gives me an error
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
2 nodes produced errors; first error: 'data' must be of a vector type, was 'NULL'
In addition: Warning message:
group_indices_.grouped_df ignores extra arguments
Or like following
multidplyr::cluster_library(cluster, "purrr")
multidplyr::cluster_library(cluster, "prophet")
df %>%
partition(group) %>%
mutate(m = purrr::map(data, prophet::prophet)) %>%
mutate(future = purrr::map(m, prophet::make_future_dataframe, period = 7)) %>%
mutate(forecast = purrr::map2(m, future, predict)) %>%
collect()
Which gives me the following error
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
2 nodes produced errors; first error: Evaluation error: `.x` is not a vector (closure).
In addition: Warning message:
group_indices_.grouped_df ignores extra arguments
Thus, I am lost on how to proceed. Any suggestion is more than welcome. Thank you in advance.
Ps.: Here is my sessionInfo()
> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux buster/sid
locale:
[1] LC_CTYPE=C LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] purrr_0.2.2.2 bindrcpp_0.2 prophet_0.1.1
[4] Rcpp_0.12.12 multidplyr_0.0.0.9000 dplyr_0.7.2
[7] tidyr_0.6.3
loaded via a namespace (and not attached):
[1] bindr_0.1 magrittr_1.5 munsell_0.4.3
[4] lattice_0.20-35 colorspace_1.3-2 R6_2.2.2
[7] rlang_0.1.1 extraDistr_1.8.6 plyr_1.8.4
[10] tools_3.3.3 parallel_3.3.3 grid_3.3.3
[13] gtable_0.2.0 StanHeaders_2.16.0-1 lazyeval_0.2.0
[16] assertthat_0.2.0 tibble_1.3.3 rstan_2.16.2
[19] gridExtra_2.2.1 ggplot2_2.2.1 codetools_0.2-15
[22] inline_0.3.14 glue_1.1.1 stringi_1.1.5
[25] scales_0.4.1 stats4_3.3.3 pkgconfig_2.0.1
[28] zoo_1.8-0