Problem

I have a data frame called FID (see below) and I am attempting to use the package data.table to summarize my data. I want to summarise my data by:-

Desired Summarised Data frame

  1. Month
  2. Total frequency of FID per month over 3 years
  3. Mean frequency of FID per month over 3 years
  4. Standard deviation of FID per month over 3 years
  5. Standard error of FID per month over 3 years
  6. Lower confidence levels per month over 3 years
  7. Upper confidence levels per month over 3 years

I can perform some of these procedures separately (see below), but I would like to combine all of the information stated above in the desired data frame list (above) together into one table.

I have read extensively on Stack Overflow pages and other data.table tutorials but I cannot find any information with how to calculate the standard error, and the upper and lower confidence intervals using the package data.table. Does anyone know how to do this?

  ##Summary Statistics table of FID per month over 3 years

   library(data.table)

  ##Produce a data.table object
    FID.Table<-data.table(FID)

   ##R-code
   Mean.FID<-FID_Table[, .(FID.Freq=sum(FID),
                        mean = mean(FID),
                        sd=sd(FID),
                        median=median(FID)), 
                        by = .(Month)]

 ###Summary Statistics table 
       Month FID.Freq      mean        sd median
 1:   January      165 55.000000 10.535654     56
 2:  February      182 60.666667 29.737743     65
 3:     March      179 59.666667 33.291641     43
 4:     April      104 34.666667 16.862186     27
 5:       May      124 41.333333 49.571497     20
 6:      June       10  3.333333  5.773503      0
 7:      July       15  5.000000  4.358899      7
 8:    August      133 44.333333 21.007935     45
 9: September       97 32.333333 21.548395     34
10:   October       82 27.333333 13.051181     26
11:  November       75 25.000000 19.000000     25
12:  December      102 34.000000  4.582576     33
    

Data frame: FID

structure(list(Year = c(2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2016L, 2016L, 2016L, 
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L), Month = structure(c(5L, 4L, 8L, 1L, 9L, 
7L, 6L, 2L, 12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L, 9L, 7L, 6L, 2L, 
12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L, 9L, 7L, 6L, 2L, 12L, 11L, 
10L, 3L), .Label = c("April", "August", "December", "February", 
"January", "July", "June", "March", "May", "November", "October", 
"September"), class = "factor"), FID = c(65L, 88L, 43L, 54L, 
98L, 0L, 0L, 23L, 10L, 15L, 6L, 33L, 56L, 29L, 98L, 23L, 6L, 
10L, 7L, 65L, 53L, 41L, 25L, 30L, 44L, 65L, 38L, 27L, 20L, 0L, 
8L, 45L, 34L, 26L, 44L, 39L)), class = "data.frame", row.names = c(NA, 
-36L))
1

There are 1 answers

1
Andrew On BEST ANSWER

Assuming you want the number of rows in each month to be the denominator for your standard error (i.e., .N), then you can use this to create 95% ci's (i.e., * 1.96). Alternatively, if you have missing data, you may want to use sum(!is.na(FID.Freq)) instead of .N. In short, just calculate standard error for each month then add the ci's as columns later:

library(data.table)

setDT(FID)

Mean.FID = FID[, .(FID.Freq=sum(FID),
                   mean = mean(FID),
                   sd=sd(FID),
                   se=sd(FID) / sqrt(.N),
                   median=median(FID)), by = Month]

Mean.FID[,  `:=`(lo_ci = mean - se * 1.96, up_ci = mean + se * 1.96)]

Mean.FID
        Month FID.Freq      mean        sd        se median       lo_ci     up_ci
 1:   January      165 55.000000 10.535654  6.082763     56  43.0777854 66.922215
 2:  February      182 60.666667 29.737743 17.169094     65  27.0152431 94.318090
 3:     March      179 59.666667 33.291641 19.220938     43  21.9936289 97.339704
 4:     April      104 34.666667 16.862186  9.735388     27  15.5853064 53.748027
 5:       May      124 41.333333 49.571497 28.620117     20 -14.7620965 97.428763
 6:      June       10  3.333333  5.773503  3.333333      0  -3.2000000  9.866667
 7:      July       15  5.000000  4.358899  2.516611      7   0.0674415  9.932558
 8:    August      133 44.333333 21.007935 12.128937     45  20.5606169 68.106050
 9: September       97 32.333333 21.548395 12.440972     34   7.9490287 56.717638
10:   October       82 27.333333 13.051181  7.535103     26  12.5645314 42.102135
11:  November       75 25.000000 19.000000 10.969655     25   3.4994760 46.500524
12:  December      102 34.000000  4.582576  2.645751     33  28.8143274 39.185673