R, time series, average, sequence

410 views Asked by At

Hey guys I have a zoo object of daily observations for 30 years that Looks something like this:

date        x
1971-11-01  145.234
1971-11-02  234.522
1971-11-03  423.32
1971-11-04  333.11

I would like to calculate means for the period November to April for the entire time series. So my desired result should look like this:

date              x
11/1971-04/1972   642.43
11/1972-04/1973   142.53
11/1973-04/1974   642.39
11/1974-04/1975   424.75
11/1975-04/1976   185.34

Can somebody help me?

1

There are 1 answers

2
lmo On BEST ANSWER

If your dates are truly daily such that November first appears every year, the following strategy will allow you to group the observations.

# drop observations outside of the window (May through October).
dfNew <- df[!grepl("^(0[56789]|10)", format(df$date, "%m")), ]

# build groups
dfNew$groups <- cumsum(c(TRUE, tail(grepl("11-01", format(dfNew$date, "%m-%d")), -1)))

The first line uses a logical vector based on a regular expression to drop months May through October. The second line uses a regular expression to return a logical vector indicating whether the observation is November first. I cut off the first element of the vector using tail and added TRUE, to make sure the that the group count begins with 1. cumsum is then used to create the group indicators.

At this point, you can use aggregate, for example, to get the group mean:

aggregate(x ~ groups, data=dfNew, FUN=mean)
  groups           x
1      1 -0.14947871
2      2 -0.02742739
3      3 -0.02296979
4      4  0.01939372
5      5 -0.01432937
6      6  0.10393297
7      7  0.06660049
8      8 -0.03955617
9      9 -0.06956639

data

set.seed(1234)
df <- data.frame(date=seq(as.Date("1971-01-01"), as.Date("1979-01-01"), by="day"),
                 x=rnorm(2923))