Building Species distribution modells for species within a raster stack

23 views Asked by Stone Bee At 29 February 2024 at 09:05

I have run a GLM Species Distribution Model for a single species, but I want to replicate this for a different species within a raster stack. I initially ran an SDM for one species and got an error trying to run this model on the raster stack consisting of other species. I want a function that gets a raster stack of predicted presence probabilities in the raster stack and returns a stack of predicted presence/absence for each species.

This is what I have tried:

Occurrence_data1 <- data.frame(
  lon = c(11.002470, 10.733250, 11.135831, 6.003845, 5.073000, 8.859500, 10.740000),
  lat = c(59.05563, 63.57087, 60.15113, 62.40066, 60.11600, 58.62880, 59.95000),
  species = c("B.terrestris", "B.pascuorum", "B.hortorum", "B.pratorum", "B.terrestris", "B.pascuorum", "B.hortorum")
) 

# filter out dataframe for single species 
Occurrence_data <- B_occurrences1|>
  dplyr::arrange(species)%>%
  group_by(species) %>%
  distinct(latitude, longitude, .keep_all = TRUE)%>% # to work on just one species run the next code 
  filter(species == "B. terrestris sensu lato")

#create paths 
dir.create(path = "data")
dir.create(path = "output")

#Load libraries 
pacman::p_load(terra, geodata, predicts) 

##download climate data 
bioclim_data <-geodata::worldclim_country("Norway",  path= "data/", var="bio", res =0.5) #download=T

#plot(bioclim_data[[1]])
#plot(Occurrence_data$longitude, Occurrence_data$latitude)
#Before we do any modeling, it is also a good idea to run a reality check on your occurrence data
# Download data with geodata's world function to use for our base map
world_map <- world(resolution =2.5,
                   path = "data/")

# Crop the map to our area of interest
my_map <- crop(x = world_map, y = geographic_extent)

#####prepare data for modelling
# Make an extent that is 25% larger
sample_extent <- geographic_extent * 1.25

# Crop bioclim data to desired extent
bioclim_data <- crop(x = bioclim_data, y = sample_extent)

# Plot the first of the bioclim variables to check on cropping
plot(bioclim_data[[1]])

#The pseudo-absence point or “background” points.
# Set the seed for the random-number generator to ensure results are similar
set.seed(20210707)
# Randomly sample points (same number as our observed points)
background <- spatSample(x = bioclim_data,
                         size = 10000,    # generate 10,000 pseudo-absence points
                         values = FALSE, # don't need values
                         na.rm = TRUE,   # don't sample from ocean
                         xy = TRUE)      # just need coordinates

#We have observation data and pseudo-absence data and we need to first put them into one data structure, 
#then add in the climate data so we have a single data frame with presence points and pseudo-absence points, and climate data for each of these points
# Pull out coordinate columns, x (longitude) first, then y (latitude) from 
# Bombus model data 
presence <- Occurrence_data[, c("longitude", "latitude")]
# Add column indicating presence
presence$pa <- 1

# Convert background data to a data frame
absence <- as.data.frame(background)
# Update column names so they match presence points
colnames(absence) <- c("longitude", "latitude")
# Add column indicating absence
absence$pa <- 0

# Join data into single data frame
all_points <- rbind(presence, absence)

###Add climate data
#add climate data to the coordinate data sets
bioclim_extract <- terra::extract(x = bioclim_data,
                           y = all_points[, c("longitude", "latitude")],
                           ID = FALSE) # No need for an ID column

#we need to join these extracted data back with our all_points data frame. After we do this, we do not need the latitude and longitude coordinates anymore, 
#so we can drop those two columns (at least for building the model). Add the point and climate datasets together
points_climate <- cbind(all_points, bioclim_extract)

# Identify columns that are latitude & longitude
drop_cols <- which(colnames(points_climate) %in% c("longitude", "latitude"))
drop_cols # print the values as a reality check
# Remove the geographic coordinates from the data frame
points_climate <- points_climate[, -drop_cols]

##I create a raster that has all the conditions equal to study Area
## and then use the values of the cells without NAs (cells with NA are the ocean)
## to remove NAs from points_climate 
points_climate <- na.omit(points_climate)

##Training and testing data
#We are going to reserve 20% of the data for testing and to evenly assign each point to a random group (Test /Train) using function from the predicts package
# Create vector indicating fold
fold <- folds(x = points_climate,
              k = 5,
              by = points_climate$pa)# use the pa column to create the folds 
table(fold)

#We will say that any observations in fold 1 will be testing data, and any observations in the other folds (2, 3, 4, 5) will be training data
testing <- points_climate[fold == 1, ]
training <- points_climate[fold != 1, ]

####Model Building
## Build a model using training data
glm_model <- glm(pa ~ ., data = training, family = binomial())

# Get predicted values from the model
glm_predict <- predict(bioclim_data, glm_model, type = "response")

# Print predicted values
plot(glm_predict)
 
#We now take that model, and evaluate it using the observation data and the pseudo-absence points we reserved for model testing. 
#We then use this test to establish a cutoff of occurrence probability to determine the boundaries of the Bombus  range.
# Use testing data for model evaluation
glm_eval <- pa_evaluate(p = testing[testing$pa == 1, ],
                        a = testing[testing$pa == 0, ],
                        model = glm_model,
                        type = "response")

# Determine minimum threshold for "presence"
glm_threshold <- glm_eval@thresholds$max_spec_sens

#And finally, we can use that threshold to paint a map with sites predicted to be suitable for the Bombus species
# Plot base map
plot(my_map, 
     axes = TRUE, 
     col = "grey95")

# Only plot areas where probability of occurrence is greater than the threshold
plot(glm_predict > glm_threshold, 
     add = TRUE, 
     legend = FALSE, 
     col = c(NA, "olivedrab")) # <-- Update the values HERE

# And add those observations
points(x = Occurrence_data$longitude, 
       y = Occurrence_data$latitude, 
       col = "black",
       pch = "+", 
       cex = 0.75)

# Redraw those country borders
plot(my_map, add = TRUE, border = "grey5")

#create a raster stack 
species_list <- unique(Occurrence_data1$species)

v <- terra::vect(Occurrence_data1)
r <- terra::rast(nrows = 12, ncols = 12, crs = "EPSG:4326", ext = terra::ext(v))
#And let's rasterize the points:
x <- terra::rasterize(v, r, fun = "count", by = "species")
names(Bombus_rasterstack) <- species_list

# Convert SpatRaster to a raster stack
Bombus_rasterstack-> Bombus_rasterstack %>% stack()
names(x) <- species_list ```

Original Q&A

TechQA.

Building Species distribution modells for species within a raster stack

There are 0 answers

Related Questions in R

Related Questions in RASTER

Related Questions in GLM

Related Questions in TERRA

Related Questions in MAXENT

Popular Questions

Trending Questions