dplyr rolling change multiple columns based on multiple columns

543 views Asked by At

I am trying to bring multiple things together using dplyr: Given I have a time series of multiple returns, I want to calculate the average correlation (I simplified my real task to give the easiest possible example) of all returns with all of the other returns. Of course (in contrast to the example below) my real dataset is rather large (and not yet spread(stock,ret)) contains multiple NAs. Also, in a second step I would have to create my own function and supply that to rollapply. Therefore, if you have a suggestion using something from the RCpproll-package I would be more than happy!

In the below example you can see that I need to input all columns at once, select a window, apply a function to all columns simultaneously, receive a vector with the same number of columns and so on...

Here is my example:

df <- data.frame(Date =as.Date("1926-01-01")+1:24,
             PERMNO1 = rnorm(24,0.01,0.3),
             PERMNO2 = rnorm(24,0.02,0.4),
             PERMNO2 = rnorm(24,-0.01,0.6))
df %>%
do(rollapplyr(.[,-1],width=12,function(a) colMeans(cor(a))))

What I would like to get is something like this:

df2 <- df; df2[,2:4]<-NA
for (i in 12:24){
   df2[i,2:4] <- colMeans(cor(df[(i-12):i,2:4]))
}
df2
     Date    PERMNO1   PERMNO2 PERMNO2.1
1926-01-02         NA        NA        NA
1926-01-03         NA        NA        NA
1926-01-04         NA        NA        NA
1926-01-05         NA        NA        NA
1926-01-06         NA        NA        NA
1926-01-07         NA        NA        NA
1926-01-08         NA        NA        NA
1926-01-09         NA        NA        NA
1926-01-10         NA        NA        NA
1926-01-11         NA        NA        NA
1926-01-12         NA        NA        NA
1926-01-13 0.14701350 0.2001694 0.3787320
1926-01-14 0.15364347 0.2438042 0.3143516
1926-01-15 0.16118233 0.2549841 0.3266877
1926-01-16 0.04727533 0.2534126 0.3132990
1926-01-17 0.05220443 0.2411095 0.2744379
1926-01-18 0.12252848 0.2461743 0.2766122
1926-01-19 0.08414717 0.2287705 0.2897744
1926-01-20 0.11164866 0.2503174 0.2414130
1926-01-21 0.08886537 0.2604810 0.2621597
1926-01-22 0.14216304 0.2667540 0.2543573
1926-01-23 0.12654902 0.3086711 0.2751671
1926-01-24 0.11068607 0.3019835 0.2728166
1926-01-25 0.06714698 0.2696828 0.2184242
1

There are 1 answers

0
G. Grothendieck On BEST ANSWER

Convert the data frame to a zoo object, run rollapplyr and convert back:

library(dplyr)
library(zoo)

df %>%
   read.zoo %>%
   rollapplyr(12, function(x) colMeans(cor(x)), by.column = FALSE, fill = NA) %>%
   fortify.zoo

The last line could be omitted if you want to just keep the answer as a zoo object which would probably be more convenient than representing a time series as a data frame.