Standard Deviation every n rows in R

1.6k views Asked by At

So I am an R code beginner. It seems to me that there is a quick and dirty way to calculate the mean of a set of n rows within a column, but is there something similar for standard deviation (or standard error)? I'd like to avoid looping if possible because this is only a small part of the increasingly unwieldy (for a beginner) code I am building. Here is a simplified example of the dataset I will be working with:

     Canopy Species    Date            Pa
1     Maple    BETH    4/26/2014 -0.1162607263
2     Maple    BETH    4/26/2014 -0.2742194706
3     Maple    BETH    4/26/2014 -0.1864006372
4     Maple    BETH    4/26/2014 -0.0739905518
5     Maple    BETH    4/26/2014 -0.0751169983
6     Maple    BETH    4/26/2014 -0.0782771938
7     Maple    BETH    4/26/2014 -0.1671646757
8     Maple    BETH    4/26/2014 -0.2464696338
9     Maple    BETH    4/26/2014 -0.2176720386
10    Maple    BETH    4/26/2014 -0.2283216397
11    Maple    BETH    4/26/2014 -0.1152989165
12    Maple    BETH    4/26/2014 -0.2720884764
13    Maple    BETH    4/26/2014 -0.1849383730
14    Maple    BETH    4/26/2014 -0.0734205199
15    Maple    BETH    4/26/2014 -0.0745294634
16    Maple    BETH    4/26/2014 -0.0776640601
17    Maple    BETH    4/26/2014 -0.1658603785
18    Maple    BETH    4/26/2014 -0.2445047320
19    Maple    BETH    4/26/2014 -0.2159337593
20    Maple    BETH    4/26/2014 -0.2264833266

and here is an example piece of code I was referring to for means. This one finds the mean for every 10 rows in the Pa column:

mu<-colMeans(matrix(Table$Pa, nrow=10))

Thank you in advance for your help and please let me know if there is any more information I should provide.

3

There are 3 answers

0
SabDeM On

Here is a mixed base R/dplyr solution: First I created a column named fac_to_spli which is the factor to use to calculate the standard deviations and then with group_by and mutate of dplyr I did the calculations.

library(dplyr)
df$fac_to_spli <- sort(rep(seq(from = 1, to = nrow(df), by = 10), nrow(df) / 2 ))
df %>% group_by(fac_to_spli) %>% mutate(stand_dev = sd(Pa))

Source: local data frame [20 x 6]
Groups: fac_to_spli [2]

   Canopy Species      Date          Pa fac_to_spli  stand_dev
   (fctr)  (fctr)    (fctr)       (dbl)       (dbl)      (dbl)
1   Maple    BETH 4/26/2014 -0.11626073           1 0.07604938
2   Maple    BETH 4/26/2014 -0.27421947           1 0.07604938
3   Maple    BETH 4/26/2014 -0.18640064           1 0.07604938
4   Maple    BETH 4/26/2014 -0.07399055           1 0.07604938
5   Maple    BETH 4/26/2014 -0.07511700           1 0.07604938
6   Maple    BETH 4/26/2014 -0.07827719           1 0.07604938
7   Maple    BETH 4/26/2014 -0.16716468           1 0.07604938
8   Maple    BETH 4/26/2014 -0.24646963           1 0.07604938
9   Maple    BETH 4/26/2014 -0.21767204           1 0.07604938
10  Maple    BETH 4/26/2014 -0.22832164           1 0.07604938
11  Maple    BETH 4/26/2014 -0.11529892          11 0.07544763
12  Maple    BETH 4/26/2014 -0.27208848          11 0.07544763
13  Maple    BETH 4/26/2014 -0.18493837          11 0.07544763
14  Maple    BETH 4/26/2014 -0.07342052          11 0.07544763
15  Maple    BETH 4/26/2014 -0.07452946          11 0.07544763
16  Maple    BETH 4/26/2014 -0.07766406          11 0.07544763
17  Maple    BETH 4/26/2014 -0.16586038          11 0.07544763
18  Maple    BETH 4/26/2014 -0.24450473          11 0.07544763
19  Maple    BETH 4/26/2014 -0.21593376          11 0.07544763
20  Maple    BETH 4/26/2014 -0.22648333          11 0.07544763
2
Rool On

What @rawr is saying using the dplyr-package:

df %>%  
mutate(id=round(row_number()/10)) %>%  
group_by(id) %>%  
summarize(mean=mean(Pa),sd=sd(Pa))  

      id     mean       sd
   (dbl)    (dbl)    (dbl)
1      0 52.00000 67.97058
2      1 32.22222 18.55921
3      2 44.54545 36.70521
4      3 23.33333 25.49510
5      4 24.54545 18.63525
6      5 58.88889 78.96905
7      6 52.72727 89.89893
8      7 31.11111 26.19372
9      8 24.54545 18.09068
10     9 50.00000 64.42049
3
mrip On

You can also do this with base R using by:

> n<-nrow(Table)
> index<-ceiling((1:n)/10)
> by(Table$Pa,index,mean)
index: 1
[1] -0.1663894
------------------------------------------------------------ 
index: 2
[1] -0.1650722
> by(Table$Pa,index,sd)
index: 1
[1] 0.07604938
------------------------------------------------------------ 
index: 2
[1] 0.07544763

Edit: you can put these in a table, for example, like this:

>cbind(index=unique(index),mean=by(Table$Pa,index,mean),sd=by(Table$Pa,index,sd))

  index       mean         sd
1     1 -0.1663894 0.07604938
2     2 -0.1650722 0.07544763