I have some very large raster files and want to rescale the values of these rasters to 0-1. Here are the information of my large rasters:
class : SpatRaster
dimensions : 161239, 132862, 1 (nrow, ncol, nlyr)
resolution : 25, 25 (x, y)
extent : 2634975, 5956525, 1385025, 5416000 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs
source : mean_temp.tif
name : mean
min value : -9.1
max value : 15.4
To rescale the raster values to 0~1, I separate the raster values to three parts:
- For pixels with values <= 5th quantile, all of these pixels get values 0.
- Pixel values that are greater or equal to 95% will get 1.
- Pixel values between the 5th quantile and 95th quantile with be
normalized to 0~1.
I use terra
and sf
R packages for rescaling and make smaller tiles of the large raster.
Here is how I do it:
library(terra)
library(sf)
min_offset <- rast("test.tif")
quantiles_min <- spatSample(min_offset, 10^4, "regular")
q2 <- as.data.frame(quantile(quantiles_min, probs=c(0.05, 0.95), na.rm=TRUE))
print(q2)
#Read in data in sf format.
c_shape <- read_sf("test.shp")
# create an initial grid for centroid determination
c_grid <- st_make_grid(c_shape, cellsize = c(5e5, 5e5), square = TRUE) |>
st_as_sf()
c_grid_spat <- vect(c_grid)
## make tiles of large rasters:
filename <- paste0("/tile/tile", "_.tif")
tile <- makeTiles(min_offset,c_grid_spat, filename, overwrite = TRUE)
tile_rasters <- c(lapply(tile, rast))
Rescaling <- function(x){
##clamp to get the range of values that are need to be rescaled.
middle_min <- clamp(x,q2[1,1], q2[2,1], values = FALSE)
#use normalization to rescale the middle range values.
normalized_mid <- ((middle_min-q2[1,1])/(q2[2,1]- q2[1,1]))|> round(digits = 1)
## Values that are lower or equal to 5th quantile became 0
min_q5 <- clamp(x, upper = q2[1,1], value=FALSE)|> round(digits = 1)
min_repl5 <- init(min_q5, fun=0) |> mask(min_q5)
##values that are greater or equal to 95% get 1.
min_q95 <- clamp(x,q2[2,1],Inf, value = FALSE)|> round(digits = 1)
min_repl95 <- init(min_q95, fun=1) |> mask(min_q95)
##save raster
final_min<- merge(normalized_mid,min_repl5,min_repl95)
return(final_min)
}
x <- rast(tile_rasters[2]);Rescaling(x) #this works
MIs <- function(w){
r <- rast(w)
y <- Rescaling(r)
wrap(y,proxy=TRUE)
} #This function work. when > i=1; r <- tile_rasters[i] ;MI_tile <- MIs(r)
###Test function###
test <- MIs(tile_rasters[2])#this work with next line.
test <- terra::unwrap(test)#this work
When I tried to apply the two functions above in foreach
parallel, I encountered some errors:
library(parallelly)
library(doParallel)
library(foreach)
cl <- makeClusterPSOCK(length(tile_rasters), autoStop = TRUE)
registerDoParallel(cl)
test <- foreach(x =1:length(tile_rasters),
.packages = "terra") %dopar% {
resu <- MIs(tile_rasters[i])
return(resu)
}
stopCluster(cl)
Error message:
Error in { : task 1 failed - "NULL value passed as symbol address"
Does anyone has clues that how I can solve this problem to have my function parallelized on each raster tile?
For the test.shp file, you can find it from here, for the test.tif file, download from here.
To merge all the tile results into a final raster, I use vrt
function, but it seems to still take quite some memory space.
Here is how I did it:
t.lst <- list.files("/AllTiles/", pattern=".tif", full.names=TRUE)
MIs_final <- function(t.lst, fout="") {
r <- vrt(t.lst)
if (fout != "") {
writeRaster(r, fout, overwrite=TRUE)
fout
} else {
wrap(r)
}
}
You wrap the SpatRaster that is send back from a node, but you also need to wrap the input SpatRaster. But I would send the filenames instead. Otherwise you would probably run out of memory.
Something like this
Set up
A function that takes the file (this function should be much faster than yours; perhaps reducing the benefit of parallelization)
or easier to understand (but probably slower)
And now you can do
And a function like this
And you need to export
qnt
andqrng
to the nodesHere is an alternative approach with multiple cores