Calculating mean for every second value in a dataframe

Question

Calculating mean for every second value in a dataframe

1.3k views Asked by xmisx At 24 June 2015 at 09:37

I would like to aggregate each two cell values by mean and continue with the same process down the column of the dataframe. To be more precise see the following dataframe extract:

    X         Y             Z
1   FRI 200101010000    -6.72
2   FRI 200101010030    -6.30
3   FRI 200101010100    -6.26
4   FRI 200101010130    -5.82
5   FRI 200101010200    -5.64
6   FRI 200101010230    -5.29
7   FRI 200101010300    -5.82
8   FRI 200101010330    -5.83
9   FRI 200101010400    -5.83
10  FRI 200101010430    -6.04
11  FRI 200101010500    -5.80
12  FRI 200101010530    -6.09

I would like to calculate the mean of every Z by Y ending with 00 and 30, that means calculate mean of #row 1+2, #row 3+4, #row 5+6 and so on...see what I expect here:

    X         Y             Z
1   FRI 200101010100    -6.51
2   FRI 200101010200    -6.04
3   FRI 200101010300    -5.47
...

Explanation: Y is time: YYYYMMDDhhmm and I would like to average measurements of 30min to measurements of 1h

Original Q&A

There are 4 answers

grrgrrbla On 24 June 2015 at 10:00

a base-R solution where I first split the vector into parts and calculate the mean of each one, this of course assumes that the order you specified always holds true. Finally I combine them back to give your result:

Z <- unlist(lapply(split(df$Z, ceiling(seq_along(df$Z) / 2)), mean))
new_df <- cbind(df[seq(1,nrow(df), 2), c("X", "Y")], Z)

output:

     X            Y      Z
1  FRI 200101010000 -6.510
3  FRI 200101010100 -6.040
5  FRI 200101010200 -5.465
7  FRI 200101010300 -5.825
9  FRI 200101010400 -5.935
11 FRI 200101010500 -5.945

JohannesNE On 24 June 2015 at 09:47

dplyr version

library(dplyr)

df$Y <- as.character(df$Y)

means <- df %>%
        group_by(hour = substr(Y, start = 1, stop=10)) %>% summarise(Z = mean(Z))

> means
Source: local data frame [6 x 2]

        hour      Z
1 2001010100 -6.510
2 2001010101 -6.040
3 2001010102 -5.465
4 2001010103 -5.825
5 2001010104 -5.935
6 2001010105 -5.945

Groups the data by the Y variable without the last two digits.

RubenLaguna On 18 May 2017 at 19:53

Although this doesn't address the OP, in general to aggregate by second if you have POSIXct column you can:

library(lubridate)
library(tidyverse)
s <- seq(from=Sys.time(), length.out = 100, by=0.4)
df  = data.frame(time=s,v=rnorm(length(s)))
df %>% 
  group_by(time=floor_date(time, "1 second")) %>% 
  summarize(v=mean(v)) # you can put any other interval like 5 minute

**David Arenburg** · Accepted Answer · 2015-06-24T09:44:29+00:00

Here's a possible data.table solution

library(data.table)
setDT(df)[, .(Y = Y[1L], Z = mean(Z)), by = .(X, indx = cumsum(substr(Y, 11, 12) == '00'))]
#      X indx            Y      Z
# 1: FRI    1 200101010000 -6.510
# 2: FRI    2 200101010100 -6.040
# 3: FRI    3 200101010200 -5.465
# 4: FRI    4 200101010300 -5.825
# 5: FRI    5 200101010400 -5.935
# 6: FRI    6 200101010500 -5.945

Or per @akruns comment, using aggregate from base (though the output will need some additional tweeking probably)

aggregate(Z ~ X + indx, transform(df, indx = cumsum(substr(Y, 11, 12) == '00')), mean)

TechQA.

Calculating mean for every second value in a dataframe

There are 4 answers

Related Questions in R

Related Questions in DATAFRAME

Related Questions in AGGREGATE

Related Questions in MEAN

Popular Questions

Popular Tags

Trending Questions