Interpolating large raster series in R

841 views Asked by At

I have 36ish years of gridded/raster monthly temperature estimates that I'd like to convert into daily estimates. For now, I'm setting the monthly estimates at the midpoint of the month and doing a simple linear interpolation. To do so, I'm attempting to use the raster::calc and stats::approx described in this question. However, in doing so, I get the following error:

Error in is.infinite(v) : default method not implemented for type 'list'

Below is some code that hopefully provides a simulation of sorts to recreate the problem. I think the problem is how NA's are dealt with because the q_interp bit at the end (where no rasters are set to NA). That said, I'm not really sure what to do with this information.

library(raster)

#The parameters of the problem
num_days = 9861
months_num = 324
num_na = 191780

#generate baseline rasters
r <- raster(nrows=360, ncols=720); 
values(r) <- NA
x <- sapply(1:months_num, function(...) setValues(r, runif(ncell(r))))

#make them a stack
s = stack(x)

#define what x coordinates the rasters refer to (e.g. loosely convert monthly to daily). Probably not the most elegant solution in the world.
num_day_month = c(31,28,31,30,31,30,31,31,30,31,30,31)
days = as.character(seq(as.Date('1989/01/01'), as.Date('2015/12/31'), by = 'day'))
months = as.character(seq(as.Date('1989/01/01'), as.Date('2015/12/01'), by = 'month'))
months = substr(months, 1,nchar(months)-3)
mid_points = as.vector(lapply(months, function(x) grep(x,days,value =T)[round(length(grep(x,days,value =T))/2)]))
mp_loc = days %in% mid_points
#output is the monthly mid points on the daily scale
mp_day_locs = (1:length(days))[mp_loc]

#make some of the cells NA throughout the whole span. In the actual dataset, the NAs generally represent oceans.
s[sample(ncell(s), num_na)] = NA

#a function to interpolate
interp_row <- function(base_indexes, value_vector, return_indexes, rule_num =2) {
  nnn = length(value_vector)
  if (any(is.na(value_vector))) {
    return(rep(NA, nnn))
  } else {
    return(approx(x = base_indexes, y= value_vector, xout = return_indexes, rule=rule_num)$y)
  }
}

#this is the function call that causes the error to be thrown
s_interp = calc(s, function(y) interp_row(base_indexes = mp_day_locs, value_vector = y, return_indexes = 1:length(days),rule_num = 2))

#Now make a without NAs-- seems to work
#generate baseline rasters
r <- raster(nrows=360, ncols=720); 
values(r) <- NA
x <- sapply(1:months_num, function(...) setValues(r, runif(ncell(r))))
#make them a stack
q = stack(x)
q_interp = calc(q, function(y) interp_row(base_indexes = mp_day_locs, value_vector = y, return_indexes = 1:length(days),rule_num = 2))
1

There are 1 answers

0
maRtin On BEST ANSWER

The problem (as far as I can see) is the length of return vector created if any value is NA. In your case its length is the same as the input vector:

# length = 324
nnn = length(value_vector)
return(rep(NA, nnn))

However the return vector created in the correct case (without NA) is much longer (daily values):

#length = 9861
return(approx(x = base_indexes, y= value_vector, xout = return_indexes, rule=rule_num)$y)

As far as I know, the two return cases need to have the same lenght. Try to change your function as follows by setting rep(NA, length(return_indexes)) :

interp_row <- function(base_indexes, value_vector, return_indexes, rule_num =2) {
  nnn = length(value_vector)
  if (any(is.na(value_vector))) {
    return(rep(NA, length(return_indexes)))
  } else {
    return(approx(x = base_indexes, y= value_vector, xout = return_indexes, rule=rule_num)$y)
  }
}

Note: Your code seems to be quite slow. One quick solution is to use the clusterR() function to speed up your code. This function enables the use of multicore processing of some raster functions like calc. Type ?clusterR or have a look here.