I studied STAN some years ago, when the tutorial model was 8schools.stan, then got busy with other things for several years. I'm now back trying to relearn STAN. Now the tutorial model is just schools.stan. I have run these two versions of the same basic model, setting the seed to the same value. I get two results that are quite similar, but not identical, but with strikingly different values for lp__.
The only difference between 8schools.stan and schools.stan is in the model section. The diff of the two files is:
[c:\Larry\R-Spaces\STAN]# diff 8schools.stan school.stan
7,18c17,18
eta ~ normal(0, 1);
y ~ normal(theta, sigma);
--
target += normal_lpdf(eta | 0, 1);
target += normal_lpdf(y | theta, sigma);
9a20
As I understand it, these two model statements are equivalent. I ran the two models using the same schools_dat data set given in the tutorial, using the following STAN invocation, changing only fit1 to fit2 and changing the STAN file from 8schools.stan to schools.stan for the two runs.
fit2 <- stan(
file = "schools.stan", # Stan program
data = schools_dat, # named list of data
chains = 4, # number of Markov chains
warmup = 1000, # number of warmup iterations per chain
iter = 2000, # total number of iterations per chain
cores = 4, # number of cores (using 2 just for the vignette)
refresh = 1000, # show progress every 'refresh' iterations
seed = 5
)
Results for 8schools:
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
mu 8.07 0.12 5.12 -1.57 4.73 7.92 11.25 19.01 1839 1
tau 6.54 0.14 5.55 0.20 2.45 5.19 9.06 21.00 1491 1
eta[1] 0.37 0.01 0.92 -1.45 -0.24 0.39 0.98 2.12 4000 1
...
...
theta[8] 8.68 0.14 8.03 -5.68 3.79 8.15 12.92 26.57 3403 1
lp__ -4.79 0.07 2.51 -10.25 -6.37 -4.57 -3.04 -0.41 1202 1
and for schools.stan:
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
mu 8.04 0.19 5.25 -2.05 4.76 7.84 11.20 18.65 730 1.00
tau 6.34 0.20 5.46 0.22 2.33 5.00 8.86 21.39 724 1.01
eta[1] 0.35 0.02 0.94 -1.56 -0.28 0.38 0.99 2.12 3071 1.00
...
...
theta[8]8.43 0.15 7.63 -6.59 3.78 8.13 12.63 25.05 2742 1.00
lp__ -39.67 0.07 2.60 -45.31 -41.23 -39.45 -37.81 -35.25 1336 1.00
The results of the two models are quite close, but not identical, with the exception of lp__, which is quite different. I suspect that the two models compiled slightly differently, so that the seed didn't give identical values. But are these two model statements really identical? Aside from the minor differences in the estimated parameters -- well within the variability expected from sampling (but note the identical seed), the striking difference is in the value of lp__.
What's going on here? Thanks in advance to anyone that can clarify this issue for me.
These days, the version with
~
drops any constants (the square root of two pi in the case of normal) in the density functions while the version with+=
keeps them. There should not be any systematic difference in the parameter estimates, although they will not identical if you do not set the pseudo-random number generator seed.