I want to estimate rolling value-at-risk for a dataset of about 22.5 million observations, thus I want to use sparklyr for fast computation. Here is what I did (using a sample database):
library(PerformanceAnalytics)
library(reshape2)
library(dplyr)
data(managers)
data <- zerofill(managers)
data<-as.data.frame(data)
class(data)
data$date=row.names(data)
lmanagers<-melt(data, id.vars=c('date'))
Now I estimate VaR using dplyr and PerformanceAnalytics packages:
library(zoo) # for rollapply()
var <- lmanagers %>% group_by(variable) %>% arrange(variable,date) %>%
mutate(var=rollapply(value, 10,FUN=function(x) VaR(x, p=.95, method="modified",align = "right"), partial=T))
This works fine. Now I do this to make use of sparklyr:
library(sparklyr)
sc <- spark_connect(master = "local")
lmanagers_sp <- copy_to(sc,lmanagers)
src_tbls(sc)
var_sp <- lmanagers_sp %>% group_by(variable) %>% arrange(variable,date) %>%
mutate(var=rollapply(value, 10,FUN=function(x) VaR(x, p=.95, method="modified",align = "right"), partial=T)) %>%
collect
But this gives the following error:
Error: Unknown input type: pairlist
Can anyone please tell me where is the error and what is the correct code? Or any other solution to estimate rolling VaR faster is also appreciates.
For custom
dplyr
backends likesparklyr
,mutate
does not currently support arbitrary R functions defined in other packages; therefore,rollapply()
is currently unsupported.In order to calculate value-at-risk in
sparklyr
, one approach is to extend sparklyr using Scala and R and follow an approach similar to: Estimating Financial Risk with Apache Spark.