How to calculate temperature departure from normal using a 30 year average - R 4.0.0

236 views Asked by At

I have added my entire code below, so what have I verified is working:

  1. I know for a fact I can download any climate station from the NCDC website for the specific time range I want. On a side note, if you can look at my 'bind_rows()` command and make it less messy, I could not find a better way to do it.

  2. I know TAVG is calculated and working

  3. Monthly summaries, which makes the data.set mso_sum works perfectly

So what is not working:

  1. Finding my departures from 30-year norms

How do I want it to work:

  1. filter the years 1981:2010
  2. group so every January 1, Jan 2, Jan 3, etc, Feb 1, Feb 2, etc can be summarized by day
  3. summarize the mean TAVG (temperature average found from MaxT and MinT)
  4. then take the entire dataset and subtract daily TAVG from CliAvgT

This is the code I tried:

mso_light %>%
  group_by(month, day) %>%
  summarise(CliAvgT = mean(TAVG[1981:2010], na.rm = T)) %>%
  mutate(Avg_DepT = CliAvgT - TAVG) %>%
  ungroup()

I also tried this alternate code:

mso_light %>%
  filter(year >= "1981", year <= "2010") %>%
  group_by(month, day) %>%
  summarise(CliAvgT = mean(TAVG, na.rm = T)) %>%
  mutate(Avg_DepT = CliAvgT - TAVG) %>%
  ungroup()

and received this error message:

> mso_light %>%
Warning messages:
1: Unknown or uninitialised column: `value`. 
2: Unknown or uninitialised column: `value`. 
+   filter(year >= "1981", year <= "2010") %>%
+   group_by(month, day) %>%
+   summarise(CliAvgT = mean(TAVG, na.rm = T)) %>%
+   mutate(Avg_DepT = CliAvgT - TAVG) %>%
+   ungroup()
`summarise()` regrouping output by 'month' (override with `.groups` argument)
Error: Problem with `mutate()` input `Avg_DepT`.
x object 'TAVG' not found
i Input `Avg_DepT` is `CliAvgT - TAVG`.
i The error occured in group 1: month = 1.
Run `rlang::last_error()` to see where the error occurred.

Finally here is all my code:

library(rnoaa)
library(tidyverse)
library(data.table)
library("openair")
library("chron")
library('lubridate')

## grab first half of year

getNoaaP1 <- function(yr, type = c('tmax','tmin','PRCP', 'SNOW', 'SNWD')) 
  ncdc(datasetid = 'GHCND', 
       stationid = 'GHCND:USW00024153',
       datatypeid = type, 
       startdate = paste0(yr, '-01-01'), 
       enddate = paste0(yr, '-06-30'), limit = 1000)  

## grab second half of year

getNoaaP2 <- function(yr, type = c('tmax','tmin','PRCP', 'SNOW', 'SNWD')) 
  ncdc(datasetid = 'GHCND', 
       stationid = 'GHCND:USW00024153',
       datatypeid = type, 
       startdate = paste0(yr, '-07-01'), 
       enddate = paste0(yr, '-12-31'), limit = 1000)  

res1 <- setNames(lapply(1948:2020, getNoaaP1), paste0("Year", 1948:2020, "P1"))
res <- setNames(lapply(1948:2020, getNoaaP2), paste0("Year", 1948:2020, "P2"))

# this would export all individual list elements to the global environment:
list2env(res, envir = .GlobalEnv) 
list2env(res1, envir = .GlobalEnv) 

# this would combine the individual lists
mso <- bind_rows(Year1948P1$data, Year1949P1$data, Year1950P1$data, Year1951P1$data,
                 Year1952P1$data, Year1953P1$data, Year1954P1$data, Year1955P1$data,
                 Year1956P1$data, Year1957P1$data, Year1958P1$data, Year1959P1$data,
                 Year1960P1$data, Year1961P1$data, Year1962P1$data, Year1963P1$data,
                 Year1964P1$data, Year1965P1$data, Year1966P1$data, Year1967P1$data,
                 Year1968P1$data, Year1969P1$data, Year1970P1$data, Year1971P1$data,
                 Year1972P1$data, Year1973P1$data, Year1974P1$data, Year1975P1$data,
                 Year1976P1$data, Year1977P1$data, Year1978P1$data, Year1979P1$data,
                 Year1980P1$data, Year1981P1$data, Year1982P1$data, Year1983P1$data,
                 Year1984P1$data, Year1985P1$data, Year1986P1$data, Year1987P1$data,
                 Year1988P1$data, Year1989P1$data, Year1990P1$data, Year1991P1$data,
                 Year1992P1$data, Year1993P1$data, Year1994P1$data, Year1995P1$data,
                 Year1996P1$data, Year1997P1$data, Year1998P1$data, Year1999P1$data,
                 Year2000P1$data, Year2001P1$data, Year2002P1$data, Year2003P1$data,
                 Year2004P1$data, Year2005P1$data, Year2006P1$data, Year2007P1$data,
                 Year2008P1$data, Year2009P1$data, Year2010P1$data, Year2011P1$data,
                 Year2012P1$data, Year2013P1$data, Year2014P1$data, Year2015P1$data,
                 Year2016P1$data, Year2017P1$data, Year2018P1$data, Year2019P1$data,
                 Year2020P1$data,
                 Year1948P2$data, Year1949P2$data, Year1950P2$data, Year1951P2$data,
                 Year1952P2$data, Year1953P2$data, Year1954P2$data, Year1955P2$data,
                 Year1956P2$data, Year1957P2$data, Year1958P2$data, Year1959P2$data,
                 Year1960P2$data, Year1961P2$data, Year1962P2$data, Year1963P2$data,
                 Year1964P2$data, Year1965P2$data, Year1966P2$data, Year1967P2$data,
                 Year1968P2$data, Year1969P2$data, Year1970P2$data, Year1971P2$data,
                 Year1972P2$data, Year1973P2$data, Year1974P2$data, Year1975P2$data,
                 Year1976P2$data, Year1977P2$data, Year1978P2$data, Year1979P2$data,
                 Year1980P2$data, Year1981P2$data, Year1982P2$data, Year1983P2$data,
                 Year1984P2$data, Year1985P2$data, Year1986P2$data, Year1987P2$data,
                 Year1988P2$data, Year1989P2$data, Year1990P2$data, Year1991P2$data,
                 Year1992P2$data, Year1993P2$data, Year1994P2$data, Year1995P2$data,
                 Year1996P2$data, Year1997P2$data, Year1998P2$data, Year1999P2$data,
                 Year2000P2$data, Year2001P2$data, Year2002P2$data, Year2003P2$data,
                 Year2004P2$data, Year2005P2$data, Year2006P2$data, Year2007P2$data,
                 Year2008P2$data, Year2009P2$data, Year2010P2$data, Year2011P2$data,
                 Year2012P2$data, Year2013P2$data, Year2014P2$data, Year2015P2$data,
                 Year2016P2$data, Year2017P2$data, Year2018P2$data, Year2019P2$data,
                 Year2020P2$data)

## build data.frame and remove 'station ID' column
mso_light <- mso[, -3]

## remove time from date group
mso_date <- mso_light[1]
mso_date <- sub("T.*", "", mso_date$date)
mso_light$date <- mso_date 

## remove flags for fl_so? and fl_t (time)
mso_light <- mso_light[1:5]

## Change 'T' = 9998 & 'M' = 9999
mso_light$value[mso_light$fl_m == "T"] <- 0
mso_light$value[mso_light$fl_q == "M"] <- 'na'

mso_light$value <- as.numeric(mso_light$value)

## pivot data frame

## eventually use to change column names
## v_names <- c('PRCP', 'SNOW', 'SNWD', 'TMAX', 'TMIN')

mso_light <- mso_light[1:3]

mso_light <- pivot_wider(mso_light,
                          names_from = datatype,  
                          values_from = value)
## mso_light <- select(mso_light, -c('fl_m','fl_q'))

options(stringAsFactors = FALSE)

mso_light$date <- as.Date(mso_light$date, "%Y-%m-%d")


## Turning all daily temperatures into an average

mso_light <- mso_light %>% rowwise() %>%
  mutate(TAVG = mean(c(TMAX, TMIN), na.rm = T))

## Composing daily data into monthly packages

mso_light <- mso_light %>%
  mutate(month = month(date)) %>%
  mutate(year = year(date)) %>%
  mutate(day = day(date))

mso_light <- mso_light %>%
  relocate('year', 'month', 'day') 

## mso_light <- mso_light[-4]

mso_sum <- mso_light %>%
  group_by(month, year) %>% 
  summarize(AVG_TAVG=mean(TAVG, na.rm = TRUE),
          T_PRCP=sum(PRCP, na.rm=TRUE),
          T_SNOW=sum(SNOW, na.rm=TRUE)) %>% 
  ungroup()

## make 30 year averages, using 1981-2010

mso_light %>%
  group_by(month, day) %>%
  summarise(CliAvgT = mean(TAVG[1981:2010], na.rm = T)) %>%
  mutate(Avg_DepT = CliAvgT - TAVG) %>%
  ungroup()

##mso_DeptT <- mso_light %>%
##  group_by(month, day) %>%
##  mean(mso_light$TAVG[1981:2010], na.rm = T) %>%
##  ungroup()

##mso_DeptT <- filter(mso_light, year >= "1981", year <= "2010") %>%
##  group_by(day, month) %>%
##  mutate(daily_DeptT = mean(TAVG, na.rm = T)) %>%
##  ungroup()

cli_Avg <- filter(mso_sum, year >= "1981", year <= "2010") %>%
  group_by(month) %>%
  summarize(T_dep = mean(AVG_TAVG, na.rm = T),
            Mon_Precip = mean(T_PRCP, na.rm = T),
            Mon_Snow = mean(T_SNOW, na.rm = T))

write.csv(mso_light, "mso_light.csv")
write.csv(mso_sum, "mso_sum.csv")
write.csv(cli_Avg, "cli_avg.csv")
1

There are 1 answers

3
Abdessabour Mtk On

using data.table your code could be written as such. Also the issue is that summarise removes all other columns plus you sould use it like this summarise(CliAvgT = mean(TAVG, na.rm = T))

tidy way

mso_light %>%
  group_by(month, day) %>%
  summarise(CliAvgT = mean(TAVG, na.rm = T)) %>%
  ungroup() %>% right_join(mso_light) %>%
  mutate(Avg_DepT = CliAvgT - TAVG) 

using data.table syntax

this is a full reformulation of your code :

# rbinding all the data into one data.table
# lapply(c(res,res1),`[[`, "data") <=> lapply(c(res,res1),function(x) x[["data"]])
mso <- rbindlist(lapply(c(res,res1),`[[`, "data"))

# remove the station column
# remove time from date 
mso[,c("station", "date"):=list(NULL, as.Date(sub("T.+$", "", date)))]

# dcast <=> pivot_wider
# mso[,1:3] get the first three columns
# and pivot them wider 
# pivot_wider(names_from, values_from) <=> dcast(., idcol ~ names_from, value.var=values_from)
mso_light <- dcast(mso[,1:3], date ~ datatype, value.var = "value")

# calculate TAVG as we only have two values => TAVG= (TMAX+TMIN)/2
# then create the year, month day columns
mso_light[, TAVG:= (TMAX+TMIN)/2][, c("year", "month", "day") := list(year(date),month(date),day(date))]

# create a new data.table that contains the year, month and the average TAVG the sums of PRCP and SNOW by month and year
mso_sum <- mso_light[,list(year, month, AVG_TAVG=mean(TAVG, na.rm = TRUE),
          T_PRCP=sum(PRCP, na.rm=TRUE),
          T_SNOW=sum(SNOW, na.rm=TRUE)), by=c("year","month")]

# 
mso_light[, CliAvgT := mean(TAVG, na.rm = T), by=c("month", "day")][, Avg_DepT := CliAvgT - TAVG]

cli_Avg <- mso_sum[year>=1981 & year<=2020, list(T_dep = mean(AVG_TAVG, na.rm = T),
            Mon_Precip = mean(T_PRCP, na.rm = T),
            Mon_Snow = mean(T_SNOW, na.rm = T)), by=month ]

some explanations:

basically using data.table the [ has some new arguments. I will try to give a concise introduction :

  • df[, new.column:=old.col*2] : creates new.column and adds it to the current df <=> df$new.column <- df$old.col*2
  • df[col<0] returns the rows that verify col<0 i.e df[df$col<0]
  • you can mix the two and it will basically modify only the rows where the condition is verified.
  • the by argument : basically groups the data by the provided columns df[, , by=col]<=> df %>% group_by(col)
  • one last caveat is df[, c(col1,col2)] will return a vector containing the values in those column, and using list instead of c will return a data.table containing the values of the columns.

an introduction tutorial