Regression kriging of binomial data

2.3k views Asked by At

I use gstat to predict a binomial data, but the predicted values go above 1 and below 0. Does anyone know how I can deal with this issue? Thanks.

data(meuse)
data(meuse.grid)
coordinates(meuse) <- ~x+y
coordinates(meuse.grid) <- ~x+y
gridded(meuse.grid) <- TRUE

#glm model
glm.lime <- glm(lime~dist+ffreq, meuse, family=binomial(link="logit"))
summary(glm.lime)

#variogram of residuals
var <- variogram(lime~dist+ffreq, data=meuse)
fit.var <- fit.variogram(var, vgm(nugget=0.9, "Sph", range=sqrt(diff(meuse@bbox\[1,\])^2 + diff(meuse@bbox\[2,\])^2)/4, psill=var(glm.lime$residuals)))   
plot(var, fit.var, plot.nu=T)

#universal kriging
kri <- krige(lime~dist+ffreq, meuse, meuse.grid, fit.var)
spplot(kri[1])

enter image description here

1

There are 1 answers

0
Paul Hiemstra On BEST ANSWER

In general, with this kind of regression kriging approach there is no guarantee that the model will be valid as the calculation of the trend and the residuals is separated. A few notes on your code. Notice that you use variogram to calculate the residual variogram, but variogram uses a normal linear model to calculate the trend and thus also the residuals. You need to determine your residuals from your glm, and then calculate a residual variogram based on that.

You could do this manually, or have a look at the fit.gstatModel function from the GSIF package. You could also have a look at binom.krige from the geoRglm package. This thread on R-sig-geo might also be interesting:

Taking residuas from a GLM is rather different from using indicator variables. Also there may be even some differences depending on which kind of GLM residuals you take. Run a GLM and exploring the residuals e.g. via variograms, is something I consider a routine practice, but it does not aways tell you the whole story. Fitting a GLGM (generealised linear geostatitical model) can be more conclusive since you can do infereces on the model parameters and access the relevance of the spatial term more objectively. This was the original motivation for geoRglm doing all the modelling at once and not by two steps such as fitting a model without correlation and then modelling residuals. This came with the extra burden of calibrating the MCMC algorithms. Later spBayes came to scene and indeed looks promissing proposing a more general framework whereas geoRglm is rather specific to univariate binomial and poison models.

As Roger says there is scope to play around with other alternatives like the GLMM or maybe MCMCpack, but this is certainly not ready "out-of-the-box" and code will need to be adapted for spatial purposes.