create loop in R precipitation data defined interval start and end

487 views Asked by At

I want to analyse precipitation data from South America. My objective is to determinate onset and offset of the rainy season by using cumulative percentage rainfall of each year.

My data has the format

Date        Precip   
1939-11-01  0  
1939-11-02  0  
1939-11-03  0  
1939-11-04  4.9  
1939-11-05  0  
1939-11-06  0.7  
1939-11-07  3.5

lapply1<-read.table("lapply.txt",header=T)

lapply1$Date<-as.Date(lapply1$Date)

For a single year I do this by:

cumsum(Precip/(sum(Precip)/100)

I want to write a lopp starting at every year y-07-01 and ending at (y+1)-06-30

I tried this:

lapply(lapply, function(cumsum(lapply1$Precip/(sum(lapply1$Precip)/100)), 
    ts(lapply1, freq=4, start=c(y-01-01), end=c(y+1-06-30), names(lapply1)))

I dont know how to set the start and end of the interval and to define. Additionaly I have many NAs. Could this be a problem?

1

There are 1 answers

4
WaltS On BEST ANSWER

The R time series method ts, which you're trying to use, may be appropriate when there are an equal number of values per unit of time as in the case of months and years where there are 12 months in every year. However, since the number of days in a year changes due to leap year, for your daily data you might use the zoo package, which can represent irregular time series. Using zoo, the only issue is that you start your year in the middle rather than at the end. The following is one of way of working with that. This code also deals with NA's in the data by removing them.

library(zoo)
df <- read.table(header=TRUE, 
text="Date        Precip   
1939-11-01  0  
1939-11-02  0  
1939-11-03  0  
1939-11-04  4.9  
1939-11-05  0  
1939-11-06  0.7  
1939-11-07  3.5")
precip_data <- zoo(df[,-1, drop=FALSE], order.by=as.Date(df[,1]))

#  create data for example problem
#  remove following statements to process input data
  set.seed(123)
  precip_data <- zoo(5*rexp(365*76, 100), seq(start(precip_data), by="day", length.out=365*76))
  precip_data[as.integer(365*76*runif(10000))] <- NA
#
#  calculate start and end dates of years used to accumulate data
#
  yr_start <- as.numeric( format( start(precip_data),"%Y"))
  if( start(precip_data) < as.Date(paste(yr_start,"-07-01",sep=""))) yr_start <- yr_start-1
  yr_start <- as.Date(paste(yr_start, "-07-01",sep=""))
  yr_end <- as.numeric( format( end(precip_data),"%Y"))
  if( end(precip_data) > as.Date(paste(yr_end,"-07-01",sep=""))) yr_end <- yr_end + 1
  yr_end <- as.Date(paste(yr_end, "-07-01",sep=""))
  yr_seq <- seq(yr_start, yr_end, by="year")
#
#  replace NA's with zeroes
  precip_data_zeroes <- precip_data
  precip_data_zeroes[is.na(precip_data)] <- 0
#  accumulate precipitation for each year and then divide by end of year value to compute percentages  
  cum_precip_list <- tapply(precip_data_zeroes, cut(index(precip_data_zeroes), yr_seq), cumsum)
#  list of precipitation percents by year
  cum_precip_pct <- sapply(cum_precip_list, function(x) 100*x/as.numeric(tail(x,1)))
#  zoo daily time series of precipation percents for all years
  cum_precip_zoo <- do.call(rbind, cum_precip_pct)
#  add a column with cummulative percents to original data 
  precip_data <- merge(precipitation=precip_data, cum_pct=cum_precip_zoo)
#  save precipitation data and cummulative percents as a csv file
  write.zoo(precip_data, file="precip_data.csv", index.name="Date", sep=",")

#  set the number of plots per page
  plots_per_pg <- 3
  par(mfcol=c(plots_per_pg,1))
#  set the number of years per plot
  yrs_per_plot <- 10 
  num_yr <- length(cum_precip_pct)
  num_plts <- ceiling(num_yr/yrs_per_plot)
  for(i_plt in 1:num_plts) {
    i_yr <- (i_plt-1)*yrs_per_plot+1
    i_yr_end <- min((i_yr+yrs_per_plot-1), num_yr)
    plot(cum_precip_pct[[i_yr]], xlim= c(yr_seq[i_yr], yr_seq[i_yr_end+1]),
         xlab="Year", ylab="Cummulative Precipitation (%)", cex.lab=1.4, cex.axis=1.3, xaxt="n")
    axis(side=1, at=yr_seq[i_yr:(i_yr_end+1)], labels=yr_seq[i_yr:(i_yr_end+1)], cex.axis=1.3)
    lapply((i_yr+1):i_yr_end, function(x) lines(cum_precip_pct[[x]]))
   }

cum_precip_pct contains the cumulative percents as an R list of zoo time series for each year. The first column of precip_data contains the original precipitation data and, in the second column, the cumulative percents for the entire period.

UPDATE

The code above has been modified to replace NA's with zeroes, to add the cummulative percent data as a column to the original precipitation data, and to save this as a csv file to disk.

I'm not clear on your intention for the plotting. Putting seventy something years of daily data in one plot where every year is very similar to every other year doesn't show much detail. As a more flexible alternative, the code can now display multiple years of data on a plot with several plots per page. The number of plots per page is set at 3 and the number of years per plot is set at 10 but you can change these as need be. An example of these plots is shown below.

enter image description here