microsimulation GLM including stochastic part

141 views Asked by At

I'm trying to simulate GLM functions in R including stochastic uncertainty. I compared a formula-based approach to the R-based simulate() function and get different results. Not sure what I (probably its me and not R) am doing wrong.

I start by creating a simulation cohort:

set.seed(1)
library(MASS)

d <- mvrnorm(n=3000, mu=c(30,12,60), Sigma=matrix(data=c(45, 5, 40, 5, 15, 13, 40, 13, 300), nrow=3))
d[,1] <- d[,1]^2

Fit the model:

m <- glm(formula=d[,1]~d[,2] + d[,3], family=gaussian(link="sqrt"))

predict on linear scale (using formula)

p_linear <- m$coefficients[1] + m$coefficients[2]*d[,2] + m$coefficients[3]*d[,3]

Compare both approaches' predictions including the inverse link function. It seems to be similar:

sum((predict(object=m, type="response")-p_linear^2)^2)==0

Add stochastic part to both predictions:

sd_residuals <- sd(p_linear^2 - d[,1])
p <- p_linear^2 + rnorm(n=1000, mean=0, sd=sd_residuals)
p_simulate <- simulate(m)$sim_1

Compare both predictions (formula-based and simulate() based) to original data in plot:

par(mfrow=c(1,2), mar=c(4,2,2,1))
plot(d[,1], p, col="blue", pch=20, cex=0.5, xlim=c(-500, 3000), ylim=c(-500, 3000))
abline(lm(p~d[,1]), col="blue")
points(d[,1], p_simulate, col="red", cex=0.5)
abline(lm(p_simulate~d[,1]), col="red")
abline(a=0, b=1)

...and compare both predictions:

plot(p, p_simulate, col="green", cex=0.5, xlim=c(-500, 3000), ylim=c(-500, 3000))
abline(lm(p_simulate~p), col="green")
abline(a=0, b=1)

It seems that the formula-based predictions reflect similar stochastic uncertainty as the original data. The r-based simulate() function approach seems to represent less stochastic uncertainty than the original data. Is the formula-based approach correct? If not, how should I adapt it? (out of interest: what causes the difference between both approaches?)

Eventually, I want to use the GLM model on other than the data as input values for the predictions. I also want to apply it in Excel. Therefore, I'm looking for a formula-based approach and not using R-packages like simulate().

Any tips on reading material for predictions including stochastic in GLM in R are welcome. I googled a lot, but have difficulties finding formula-based examples in R for other than family/link is Gaussian/identity.

0

There are 0 answers