Running variance when time window is not constant

246 views Asked by At

I am trying to calculate a moving variance with a window of, let's say 4 years, for each one of the names A, B and C. The data is weekly:

> head(data1, 17)
         date name       value
1  1985-01-01    A -0.44008233
2  1985-01-01    B          NA #Observe that there are some NA's
3  1985-01-01    C  0.38682496
4  1985-01-08    A  0.41806540
5  1985-01-08    B -0.05460831
6  1985-01-08    C -0.52051435
7  1985-01-15    A  1.25769395
8  1985-01-15    B  0.80272053
9  1985-01-15    C -0.34501742
10 1985-01-22    A -0.43401839
11 1985-01-22    B  0.91113966
12 1985-01-22    C  1.07131717
13 1985-01-29    A -1.55395857
14 1985-01-29    B -0.43281709
15 1985-01-29    C  0.98034779
16 1985-02-05    A  1.70557396
17 1985-02-05    B  0.44688788

My approach so far is to dcast the data and then run rollapply() (zoo) column wise with a moving window of 192 = 4*12*4:

v <- dcast(data1, date ~ name, value.var = "value")
var <- rollapply(v[-1], width=4*12*4, var, fill=NA, by.column = T)
var <- cbind(v$date, var)
var[,1] <- as.Date(var[,1])

However, I realized that for some months I have four observations (eg. 7, 14, 21, 28 February) and for some I have five weekly observations (eg.1, 8, 15, 22 and 29 of January), so using a window of 4 years * 12 months * 4 weeks observations is not correct. I was thinking to add these extra observations on the time window (width), but I am not sure how (or if it is even possible), since these change depending on how many 5-weeks-per-month and how many 4-weeks-per-month observations there are inside the time window.

Additionally, I would like to have a NA when there are NA observations within the moving time window (this is handled automatically by var() anyway I think) and also I would like to ignore zero observations. For that I was thinking I could remove the zeroes before running the running variance function and then somehow put them back in in the end. So you can ignore that part, unless, of course, you have some good idea to do it in one step.

Sample data:

set.seed(486)
date <- rep(seq(as.Date("1985-01-01"), as.Date("2010-01-1"), by="weeks"), each=3)
N <- length(date)
name <- c("A","B","C")
value <- rnorm(N)
i<-which(value %in% sample(value, 25)) ;i
j<-which(value %in% sample(value, 150)) ;j
value[i] <- NA
value[j] <- 0
data1 <- data.frame(date, name, value)
2

There are 2 answers

4
agstudy On

I don't think that you should your rolling window should coincide of the number of weeks in your data. It is just a window over data to smooth it. One idea is to do the something with 2 windows and take the average value:

library(data.table)
library(zoo)
setDT(data1)[,var := {
           v1 <- rollapplyr(value,width=4*12*4, var, fill=N)
           v2 <- rollapplyr(value,width=4*12*5, var, fill=N)
           (v1+v2)/2},  name]

PS: Here I am using data.table because it is suitable for split(per group) and rebind operations.

Edit

You can also transform you weekly data to a daily one then you can calculate more accurate a roll in this basis. The idea is to create a daily index and merge it with your original data. This will create a new data.table with missing values. You replace the missing values with first non missing values using na.locf.

library(data.table)
library(zoo)
ID <- 
data.table(
  date = seq(as.Date("1985-01-01"), as.Date("2010-01-1"), by="days"))
setkey(ID,date)

setDT(data1)[,date:=as.Date(date)][, 
        {
          merge(ID,.SD,all.x=TRUE)[,value := na.locf(value)]
        },
        
        name]
4
G. Grothendieck On

4 years has 208 weeks plus 5 days so it is not evenly divisible into weeks. If we use 209 weeks then we are only out by 2 days in 4 years so let us try that.

First convert data1 to "zoo" class splitting the data into separate columns according to the value of the 2nd column. z will have one column for each of A, B and C. Then define a variance function which excludes zeros and use it with rollapplyr

library(zoo)
z <- read.zoo(data1, split = 2) # 1305 x 3 
var0 <- function(x) var(x[x != 0])
r <- rollapplyr(z, 209, var0)

Leaving it as a zoo object may be sufficient but this would convert it to a 4 column data frame with columns Index, A, B and C:

fortify.zoo(r)