Is there an alternative to terra patches in R for large SpatRasters?

38 views Asked by At

Overall goal: I am trying to run Circuitscape using a habitat suitability map.

I've already filtered the data to a greater than 0.80 suitability so I currently have a large SpatRaster (21373 rows and 42746 columns) with 0 and 1 cells. I need to assign patch IDs to any touching cells (8-directions) so that I can run my connectivity model!

My current problem, I'm using the terra function patches(), but it has currently been running for 3 days and has not finished and I have no way (that I know of) to track the progress.

Right now I'm trying to be patient and let the patches function run in case its just taking quite some time.

I also tried the clump() function from the raster package but it has the same time issue.

Is this something I just have to be patient for or is there another program/package that can speed up this process? I have 5 rasters to find the patches for.

I have access to ArcGIS Pro but prefer to stay in R if possible.

1

There are 1 answers

0
Nir Graham On BEST ANSWER

Here is an example of how you might estimate how long it would take to patch data of that size on given hardware. Note that, I made some sort of assumption when simulating an example raster that 10% of the values are not NA; the patches function seems to be clearly O(n^2); I predict that on my hardware running patches would take between 7 and 8 days.

one likely reasonable way forward would seem to be to reduce the resolution of the raster. You can use a model like mdl developed here to estimate how long a smaller size would take. and find an appropriate balance.

Note that running the simulations to get the estimate itself took me 16minutes. if I had done one less larger datapoint, it would have completed closer to 1 minute. you can see the polynomial difference right there.

target_rows <- 21373
target_cols <- 42746

row_frac <- target_rows/(target_cols+target_rows)

library(tidyverse)
library(terra)
set.seed(42)

# i did up to 13 - reduce to 12 if you arent as patient and want a less accurate estimate :) 
(cols <- 2^(2:13))
(rows <- (2^(2:13)/2))

list_of_rasters <- map2_df(cols,rows,
                           \(co,ro){
                             mult <- ro*co
                             cat("\nMaking size\t", mult , "\trows:\t", ro,"\tcols:\t",co)
                             rb_time <- system.time({
                               
                               r <- rast(nrows=ro, ncols=co, xmin=0)
                               
                               to_pop <- sample.int(n = mult,
                                                    size = mult / 10,
                                                    replace = FALSE)
                               r[to_pop] <- 1L
                               
                             })
                             memsize <-  capture.output(pryr::object_size(wrap(r)))
                             cat("\nMemory use\t",memsize)
                             cat("\nElapsed Seconds making the raster:\t",rb_time[["elapsed"]])
                             tibble(
                               raster=list(r),
                               rows=ro,
                               cols=co,
                               memsize_on_disk=memsize,
                               elapsed = rb_time[["elapsed"]])
                           })

list_of_rasters

patches_list <- map(list_of_rasters$raster,
                    \(r){
                      tm_seconds <- system.time({ pr <- patches(r)})
                      cat("\nElapsed Seconds making the patches:\t",tm_seconds[["elapsed"]])
                      tibble(pr=list(pr),tm_seconds=tm_seconds[["elapsed"]])
                    })
patches_list <- list_rbind(patches_list)

(all_info <- bind_cols(list_of_rasters,
                      patches_list))


(mdl <- lm(tm_seconds~poly(rows*cols,degree=2),data=all_info))
summary(mdl)

want_df <- data.frame(
  rows=target_rows,
  cols=target_cols
)
want_df$tm_seconds <- predict(mdl,newdata=want_df)

(all_info2 <- bind_rows(all_info,
                       want_df))

rows_x_cols <- all_info2$rows*all_info2$cols
plot(rows_x_cols,
     all_info2$tm_seconds)

xtc <- seq(from=0,to=target_cols,length.out=500)
xtr <- seq(from=0,to=target_rows,length.out=500)
xl <- xtc*xtr

lines(x=xl,
      y=predict(mdl,
                newdata=list(
        rows=xtr,
        cols=xtc
      )))

seconds_to_days <- function(x){
  round(x/ (60*60*24),4)
}

# how many days ? 
seconds_to_days(all_info2$tm_seconds)