GAM with "gp" smoother: how to retrieve the variogram parameters?

648 views Asked by At

I am using the following geoadditive model

library(gamair)
library(mgcv)

data(mack)    
mack$log.net.area <- log(mack$net.area)

gm2 <- gam(egg.count ~ s(lon,lat,bs="gp",k=100,m=c(2,10,1)) +
                       s(I(b.depth^.5)) +
                       s(c.dist) +
                       s(temp.20m) +
                       offset(log.net.area),
                       data = mack, family = tw, method = "REML")

Here I am using an exponential covariance function with range = 10 and power = 1 (m=c(2,10,1)). How can I retrieve from the results the variogram parameters (nugget, sill)? I couldn't find anything in the model output.

1

There are 1 answers

0
Zheyuan Li On

In smoothing approach the correlation matrix is specified so you only estimate variance parameter, i.e., the sill. For example, you've set m = c(2, 10, 1) to s(, bs = 'gp'), giving an exponential correlation matrix with range parameter phi = 10. Note that phi is not identical to range, except for spherical correlation. For many correlation models the actual range is a function of phi.

The variance / sill parameter is closely related to the smoothing parameter in penalized regression, and you can obtain it by dividing the scale parameter by smoothing parameter:

with(gm2, scale / sp["s(lon,lat)"])
#s(lon,lat) 
#  26.20877 

Is this right? No. There is a trap here: smoothing parameters returned in $sp are not real ones, and we need the following:

gm2_sill <- with(gm2, scale / sp["s(lon,lat)"] * smooth[[1]]$S.scale) 
#s(lon,lat) 
#    7.7772 

And we copy in the range parameter you've specified:

gm2_phi <- 10

The nugget must be zero, since a smooth function is continuous. Using lines.variomodel function from geoR package, you can visualize the semivariogram for the latent Gaussian spatial random field modeled by s(lon,lat).

library(geoR)
lines.variomodel(cov.model = "exponential", cov.pars = c(gm2_sill, gm2_phi),
                 nugget = 0, max.dist = 60)
abline(h = gm2_sill, lty = 2)

enter image description here

However, be skeptical on this variogram. mgcv is not an easy environment to interpret geostatistics. The use of low-rank smoothers suggests that the above variance parameter is for parameters in the new parameter space rather than the original one. For example, there are 630 unique spatial locations in the spatial field for mack dataset, so the correlation matrix should be 630 x 630, and the full random effects should be a vector of length-630. But by setting k = 100 in s(, bs = 'gp') the truncated eigen decomposition and subsequent low-rank approximation reduce the random effects to length-100. The variance parameter is really for this vector not the original one. This might explain why the sill and the actual range do not agree with the data and predicted s(lon,lat).

## unique locations
loc <- unique(mack[, c("lon", "lat")])

max(dist(loc))
#[1] 15.98

The maximum distance between two spatial locations in the dataset is 15.98, but the actual range from the variogram seems to be somewhere between 40 and 60, which is too large.

## predict `s(lon, lat)`, using the method I told you in your last question
## https://stackoverflow.com/q/51634953/4891738
sp <- predict(gm2,
              data.frame(loc, b.depth = 0, c.dist = 0, temp.20m = 0,
                         log.net.area = 0),
              type = "terms", terms = "s(lon,lat)")

c(var(sp))
#[1] 1.587126

The predicted s(lon,lat) only has variance 1.587, but the sill at 7.77 is way much higher.