I am trying to bring multiple things together using dplyr
: Given I have a time series of multiple returns, I want to calculate the average correlation (I simplified my real task to give the easiest possible example) of all returns with all of the other returns. Of course (in contrast to the example below) my real dataset is rather large (and not yet spread(stock,ret)
) contains multiple NAs. Also, in a second step I would have to create my own function and supply that to rollapply. Therefore, if you have a suggestion using something from the RCpproll-package I would be more than happy!
In the below example you can see that I need to input all columns at once, select a window, apply a function to all columns simultaneously, receive a vector with the same number of columns and so on...
Here is my example:
df <- data.frame(Date =as.Date("1926-01-01")+1:24,
PERMNO1 = rnorm(24,0.01,0.3),
PERMNO2 = rnorm(24,0.02,0.4),
PERMNO2 = rnorm(24,-0.01,0.6))
df %>%
do(rollapplyr(.[,-1],width=12,function(a) colMeans(cor(a))))
What I would like to get is something like this:
df2 <- df; df2[,2:4]<-NA
for (i in 12:24){
df2[i,2:4] <- colMeans(cor(df[(i-12):i,2:4]))
}
df2
Date PERMNO1 PERMNO2 PERMNO2.1
1926-01-02 NA NA NA
1926-01-03 NA NA NA
1926-01-04 NA NA NA
1926-01-05 NA NA NA
1926-01-06 NA NA NA
1926-01-07 NA NA NA
1926-01-08 NA NA NA
1926-01-09 NA NA NA
1926-01-10 NA NA NA
1926-01-11 NA NA NA
1926-01-12 NA NA NA
1926-01-13 0.14701350 0.2001694 0.3787320
1926-01-14 0.15364347 0.2438042 0.3143516
1926-01-15 0.16118233 0.2549841 0.3266877
1926-01-16 0.04727533 0.2534126 0.3132990
1926-01-17 0.05220443 0.2411095 0.2744379
1926-01-18 0.12252848 0.2461743 0.2766122
1926-01-19 0.08414717 0.2287705 0.2897744
1926-01-20 0.11164866 0.2503174 0.2414130
1926-01-21 0.08886537 0.2604810 0.2621597
1926-01-22 0.14216304 0.2667540 0.2543573
1926-01-23 0.12654902 0.3086711 0.2751671
1926-01-24 0.11068607 0.3019835 0.2728166
1926-01-25 0.06714698 0.2696828 0.2184242
Convert the data frame to a zoo object, run
rollapplyr
and convert back:The last line could be omitted if you want to just keep the answer as a zoo object which would probably be more convenient than representing a time series as a data frame.