Problem
I have a data frame called FID (see below) and I am attempting to use the package data.table to summarize my data. I want to summarise my data by:-
Desired Summarised Data frame
- Month
- Total frequency of FID per month over 3 years
- Mean frequency of FID per month over 3 years
- Standard deviation of FID per month over 3 years
- Standard error of FID per month over 3 years
- Lower confidence levels per month over 3 years
- Upper confidence levels per month over 3 years
I can perform some of these procedures separately (see below), but I would like to combine all of the information stated above in the desired data frame list (above) together into one table.
I have read extensively on Stack Overflow pages and other data.table tutorials but I cannot find any information with how to calculate the standard error, and the upper and lower confidence intervals using the package data.table. Does anyone know how to do this?
##Summary Statistics table of FID per month over 3 years
library(data.table)
##Produce a data.table object
FID.Table<-data.table(FID)
##R-code
Mean.FID<-FID_Table[, .(FID.Freq=sum(FID),
mean = mean(FID),
sd=sd(FID),
median=median(FID)),
by = .(Month)]
###Summary Statistics table
Month FID.Freq mean sd median
1: January 165 55.000000 10.535654 56
2: February 182 60.666667 29.737743 65
3: March 179 59.666667 33.291641 43
4: April 104 34.666667 16.862186 27
5: May 124 41.333333 49.571497 20
6: June 10 3.333333 5.773503 0
7: July 15 5.000000 4.358899 7
8: August 133 44.333333 21.007935 45
9: September 97 32.333333 21.548395 34
10: October 82 27.333333 13.051181 26
11: November 75 25.000000 19.000000 25
12: December 102 34.000000 4.582576 33
Data frame: FID
structure(list(Year = c(2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2016L, 2016L, 2016L,
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L,
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L,
2017L, 2017L, 2017L), Month = structure(c(5L, 4L, 8L, 1L, 9L,
7L, 6L, 2L, 12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L, 9L, 7L, 6L, 2L,
12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L, 9L, 7L, 6L, 2L, 12L, 11L,
10L, 3L), .Label = c("April", "August", "December", "February",
"January", "July", "June", "March", "May", "November", "October",
"September"), class = "factor"), FID = c(65L, 88L, 43L, 54L,
98L, 0L, 0L, 23L, 10L, 15L, 6L, 33L, 56L, 29L, 98L, 23L, 6L,
10L, 7L, 65L, 53L, 41L, 25L, 30L, 44L, 65L, 38L, 27L, 20L, 0L,
8L, 45L, 34L, 26L, 44L, 39L)), class = "data.frame", row.names = c(NA,
-36L))
Assuming you want the number of rows in each month to be the denominator for your standard error (i.e.,
.N
), then you can use this to create 95% ci's (i.e.,* 1.96
). Alternatively, if you have missing data, you may want to usesum(!is.na(FID.Freq))
instead of.N
. In short, just calculate standard error for each month then add the ci's as columns later: