Finding the Median for every 3 values in a column

77 views Asked by At

I would like a line of code that is capable of finding the median of every 3 column values in the Fin1 column. E.g First value would be 5.08 out of 1.54, 5.08 and 5.26. The second would be 5.27 out of 2.79, 5.27 and 8.12. The whole data set contains 36 rows so this operation would have to occur 12 times.

(See data frame below)

Block trial Trial Category   Fin1    Fin2    Fin3   Fin4    Correct FastFin

2      1     2     1      1 1.5424  0.00000 0.00000 0.0000       1  1.5424

7      1     7     3      1 5.2617  0.97171 2.41070 3.8407       1  5.2617

9      1     9     2      1 5.0827  0.00000 0.00000 1.1977       1  5.0827

16     2    16     1      1 5.2732  1.28220 0.00000 3.0692       1  5.2732

19     2    19     2      1 8.1251  6.98210 1.52210 0.0000       1  8.1251

24     2    24     3      1 2.7960  1.87000 0.52903 0.0000       1  2.7960
2

There are 2 answers

0
r2evans On

The trick here is figuring out how to "group" every three consecutive rows. Once we have that, this then becomes a simple aggregate by group.

I'll use a grp variable to do that, calculating with cumsum((rownum - 1) %% 3 == 0)

dplyr

library(dplyr)
quux %>%
  group_by(grp = cumsum((row_number() - 1) %% 3 == 0)) %>%
  mutate(val = median(Fin1)) %>%
  ungroup()
# # A tibble: 6 × 12
#   Block trial Trial Category  Fin1  Fin2  Fin3  Fin4 Correct FastFin   grp   val
#   <int> <int> <int>    <int> <dbl> <dbl> <dbl> <dbl>   <int>   <dbl> <int> <dbl>
# 1     1     2     1        1  1.54 0     0      0          1    1.54     1  5.08
# 2     1     7     3        1  5.26 0.972 2.41   3.84       1    5.26     1  5.08
# 3     1     9     2        1  5.08 0     0      1.20       1    5.08     1  5.08
# 4     2    16     1        1  5.27 1.28  0      3.07       1    5.27     2  5.27
# 5     2    19     2        1  8.13 6.98  1.52   0          1    8.13     2  5.27
# 6     2    24     3        1  2.80 1.87  0.529  0          1    2.80     2  5.27

If you needed all Fin# columns, then

quux %>%
  group_by(grp = cumsum((row_number() - 1) %% 3 == 0)) %>%
  mutate(across(starts_with("Fin"), ~ median(.x), .names = "{.col}_median")) %>%
  ungroup()
# # A tibble: 6 × 15
#   Block trial Trial Category  Fin1  Fin2  Fin3  Fin4 Correct FastFin   grp Fin1_median Fin2_median Fin3_median Fin4_median
#   <int> <int> <int>    <int> <dbl> <dbl> <dbl> <dbl>   <int>   <dbl> <int>       <dbl>       <dbl>       <dbl>       <dbl>
# 1     1     2     1        1  1.54 0     0      0          1    1.54     1        5.08        0          0            1.20
# 2     1     7     3        1  5.26 0.972 2.41   3.84       1    5.26     1        5.08        0          0            1.20
# 3     1     9     2        1  5.08 0     0      1.20       1    5.08     1        5.08        0          0            1.20
# 4     2    16     1        1  5.27 1.28  0      3.07       1    5.27     2        5.27        1.87       0.529        0   
# 5     2    19     2        1  8.13 6.98  1.52   0          1    8.13     2        5.27        1.87       0.529        0   
# 6     2    24     3        1  2.80 1.87  0.529  0          1    2.80     2        5.27        1.87       0.529        0   

base R

grp <- cumsum((0:(nrow(quux)-1) %% 3) == 0)
grp
# [1] 1 1 1 2 2 2
Fins <- startsWith(names(quux), "Fin")
newFins <- paste0(Fins, "_median")
Fins
# [1] "Fin1" "Fin2" "Fin3" "Fin4"
newFins
# [1] "Fin1_median" "Fin2_median" "Fin3_median" "Fin4_median"
quux[,newFins] <- lapply(setNames(quux[,Fins], paste0(Fins, "_median")), function(val) ave(val, grp, FUN = median))
quux
#    Block trial Trial Category   Fin1    Fin2    Fin3   Fin4 Correct FastFin Fin1_median Fin2_median Fin3_median Fin4_median
# 2      1     2     1        1 1.5424 0.00000 0.00000 0.0000       1  1.5424      5.0827        0.00     0.00000      1.1977
# 7      1     7     3        1 5.2617 0.97171 2.41070 3.8407       1  5.2617      5.0827        0.00     0.00000      1.1977
# 9      1     9     2        1 5.0827 0.00000 0.00000 1.1977       1  5.0827      5.0827        0.00     0.00000      1.1977
# 16     2    16     1        1 5.2732 1.28220 0.00000 3.0692       1  5.2732      5.2732        1.87     0.52903      0.0000
# 19     2    19     2        1 8.1251 6.98210 1.52210 0.0000       1  8.1251      5.2732        1.87     0.52903      0.0000
# 24     2    24     3        1 2.7960 1.87000 0.52903 0.0000       1  2.7960      5.2732        1.87     0.52903      0.0000

Data

quux <- structure(list(Block = c(1L, 1L, 1L, 2L, 2L, 2L), trial = c(2L, 7L, 9L, 16L, 19L, 24L), Trial = c(1L, 3L, 2L, 1L, 2L, 3L), Category = c(1L, 1L, 1L, 1L, 1L, 1L), Fin1 = c(1.5424, 5.2617, 5.0827, 5.2732, 8.1251, 2.796), Fin2 = c(0, 0.97171, 0, 1.2822, 6.9821, 1.87), Fin3 = c(0, 2.4107, 0, 0, 1.5221, 0.52903), Fin4 = c(0, 3.8407, 1.1977, 3.0692, 0, 0), Correct = c(1L, 1L, 1L, 1L, 1L, 1L), FastFin = c(1.5424, 5.2617, 5.0827, 5.2732, 8.1251, 2.796)), class = "data.frame", row.names = c("2", "7", "9",  "16", "19", "24"))
0
jay.sf On

Creating a matrix with 3 rows then using matrixStats::colMedians.

> matrixStats::colMedians(matrix(quux$Fin1, nrow=3))
[1] 5.0827 5.2732

Data borrowed from r2evans.