I have a large three dimensional array corresponding to a hyperspectral cube in spatRaster format. I also have two smaller ones (smaller area) that correspond to reference images (dark and white) used to correct values in the pixels of the cube. The smaller ones only have one line of pixels and were originally spatRasters but were transformed to array format to do some calculations. Lastly, I have a couple of constants.

For each line in the sample image, I need to subtract the line of pixels in dark image from the line of pixels in the sample image and then divide it by the line of pixels in the white image. Then, multiply by the constants and store the corrected line in a new array that eventually becomes a corrected spatRaster.

I wrote a script that does this as follows:

#hyperspectral cube
nlines_s <- 1412  #number of lines of pixels in the image
npixels_s <- 1024 #number of pixels within each line
nbands_s <- 448   #number of spectral bands within each pixel

sample_image <- terra::rast(array(runif(1),dim = c(nlines_s ,npixels_s ,nbands_s ))) 

# spatRasters already transformed to array type
dark_current <- array(runif(1),dim = c(1024,448)) 
white_reference <- array(runif(1),dim = c(1024,448))

#integration time constants
t_s <- 13
t_w <- 13

int_time <- t_w/t_s

#reflectance of reference material
R_ref <- 0.99 



#Process that needs to be speed up
reflectance_image <- array(0, c(nlines_s,npixels_s,nbands_s ))

for (fr in 1:nlines_s) {
  
      reflectance_image[fr, , ] <- as.matrix(R_ref * int_time  * ((sample_image[fr, ,]-dark_current)/white_reference) )

print(paste0("Calculating reflectance: ",round((fr/nlines_s)*100,2), "%"))      
}

ri_r <- terra::rast(reflectance_image)
names(ri_r) <- names(sample_image)

The code works as is, but it takes a long time because of the sequential nature of the for loop. I suspect this can be optimized by applying the calculation to all the lines (rows) of the image at the same time. I've been trying to do so using apply functions but I have not succeeded (might have done it wrong, though).

What would be the solution that would speed up the processing time the most?

2

There are 2 answers

4
jblood94 On BEST ANSWER

What would be the solution that would speed up the processing time the most?

This may not be the absolute most optimal, but a simple RcppArmadillo function that takes base R arrays instead of SpatRaster objects speeds it up by almost an order of magnitude. Using @RoberHijmans larger dataset (with 448 bands) and f3:

Rcpp::cppFunction(
  "arma::cube f4(arma::cube& img, const arma::mat& drk, const arma::mat& wht) {
    for(unsigned int i = 0; i < img.n_rows; i++) {
      img.row(i) -= drk;
      img.row(i) %= wht;
    }
    return(img);
  }",
  depends = "RcppArmadillo",
  plugins = "cpp11"
)

ref_image1 <- as.array(image)
system.time(ref_image1 <- f4(ref_image1, dark[1,,], R_ref*int_time/white[1,,]))
#>  user  system elapsed 
#> 14.78    0.65   15.42 
system.time(ref_image2 <- f3())
#>   user  system elapsed                   
#> 139.89    8.63  149.09

Note that the two solutions will be very slightly different due to the different order of the multiplications.

all.equal(ref_image1, as.array(ref_image2))
#> [1] "Mean relative difference: 3.520606e-08"
2
Robert Hijmans On

Here is your (adjusted) example data.

library(terra)
#terra 1.6.52
nr <- 1400
nc <- 1000
nl <- 400
set.seed(1)
image <- terra::rast(nrow=nr, ncol=nc, nlyr=nl, vals=runif(nr*nc*nl))

dark <- array(runif(nc*nl),dim = c(1, nc, nl)) 
white <- array(runif(nc*nl),dim = c(1, nc, nl))
int_time <- 1
R_ref <- 0.99 

Your approach, slightly improved

f1 <- function() {
    ref <- array(0, c(nr, nc, nl))
    for (r in 1:nr) {
         ref[r, , ] <- as.matrix(((image[r, ] - dark)/white) )
    }
    ref <- terra::rast(ref * (R_ref * int_time), ext=ext(image), crs=crs(image))
    names(ref) <- names(image)
    ref
}

A "terra" solution

f2 <- function() {
    d <- rast(dark, ext=ext(image), crs=crs(image))
    d <- disagg(d, c(nrow(image), 1))
    w <- rast(white, ext=ext(image), crs=crs(image))
    w <- disagg(w, c(nrow(image), 1))
    (R_ref * int_time) * (image - d) / w
}

The above is better but still clumsy. In terra version "terra 1.6.52" I have added support for arithmetic computations with a SpatRaster and a matrix. The columns in the matrix represent layers, the rows represent cells. I use that in f3 below.

f3 <- function() {
    dcur <- t(apply(dark, 2, c))
    wref <- t(apply(white, 2, c))
    (R_ref * int_time) * (image - dcur) / wref
}

And now with @jblood94's solution


Rcpp::cppFunction(
  "arma::cube f4(arma::cube& img, const arma::mat& drk, const arma::mat& wht) {
    for(unsigned int i = 0; i < img.n_rows; i++) {
      img.row(i) -= drk;
      img.row(i) %= wht;
    }
    return(img);
  }",
  depends = "RcppArmadillo",
  plugins = "cpp11"
)

f5 <- function() {
    ref_image1 <- as.array(image)
    ref_image1 <- f4(ref_image1, dark[1,,], R_ref*int_time/white[1,,])
    rast(ref_image1)
}

Comparison

system.time(f1())
#   user  system elapsed 
# 132.72   15.28  148.19 

system.time(f2())
#   user  system elapsed 
#  97.58   14.28  112.20 

system.time(f3())
#   user  system elapsed 
#  20.75    7.02   27.78 

system.time(f5())
#   user  system elapsed 
#  31.74    8.67   40.45 
 
ref_image1 <- as.array(image)
system.time(f4(ref_image1, dark[1,,], R_ref*int_time/white[1,,]))
#   user  system elapsed 
#  16.41    0.81   17.24 

Note that f3 depends on "terra 1.6.52". That is currently the development version. You can install it with install.packages('terra', repos='https://rspatial.r-universe.dev')

You would expect the RcppArmadillo function f4 to be faster than f3; because f3 is more general (the arrays can have any length, they do not need to consist of a single row). However, with f4 you need to first create an array, and the output needs to be put into a SpatRaster. I account for that in f5. This suggests that f3 is the fastest with these example data.

Results may also vary based on the amount of RAM available. If that becomes limiting, f3 will start writing the results to disk, and that can slow things down considerably (you have some control over this via the terraOptions()