I reproduced the results of a hierarchical model using the rethinking package with just rstan() and I am just curious why n_eff is not closer.
Here is the model with random intercepts for 2 groups (intercept_x2) using the rethinking package:
Code:
response = c(rnorm(500,0,1),rnorm(500,200,10))
predicotr1_continuous = rnorm(1000)
predictor2_categorical = factor(c(rep("A",500),rep("B",500) ))
data = data.frame(y = response, x1 = predicotr1_continuous, x2 = predictor2_categorical)
head(data)
library(rethinking)
m22 <- map2stan(
alist(
y ~ dnorm( mu , sigma ) ,
mu <- intercept + intercept_x2[x2] + beta*x1 ,
intercept ~ dnorm(0,10),
intercept_x2[x2] ~ dnorm(0, sigma_2),
beta ~ dnorm(0,10),
sigma ~ dnorm(0, 10),
sigma_2 ~ dnorm(0,10)
) ,
data=data , chains=1 , iter=5000 , warmup=500 )
precis(m22, depth = 2)
Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
intercept 9.96 9.59 -5.14 25.84 1368 1
intercept_x2[1] -9.94 9.59 -25.55 5.43 1371 1
intercept_x2[2] 189.68 9.59 173.28 204.26 1368 1
beta 0.06 0.22 -0.27 0.42 3458 1
sigma 6.94 0.16 6.70 7.20 2927 1
sigma_2 43.16 5.01 35.33 51.19 2757 1
Now here is the same model in rstan():
# create a numeric vector to indicate the categorical groups
data$GROUP_ID = match( data$x2, levels( data$x2 ) )
library(rstan)
standat <- list(
N = nrow(data),
y = data$y,
x1 = data$x1,
GROUP_ID = data$GROUP_ID,
nGROUPS = 2
)
stanmodelcode = '
data {
int<lower=1> N;
int nGROUPS;
real y[N];
real x1[N];
int<lower=1, upper=nGROUPS> GROUP_ID[N];
}
transformed data{
}
parameters {
real intercept;
vector[nGROUPS] intercept_x2;
real beta;
real<lower=0> sigma;
real<lower=0> sigma_2;
}
transformed parameters { // none needed
}
model {
real mu;
// priors
intercept~ normal(0,10);
intercept_x2 ~ normal(0,sigma_2);
beta ~ normal(0,10);
sigma ~ normal(0,10);
sigma_2 ~ normal(0,10);
// likelihood
for(i in 1:N){
mu = intercept + intercept_x2[ GROUP_ID[i] ] + beta*x1[i];
y[i] ~ normal(mu, sigma);
}
}
'
fit22 = stan(model_code=stanmodelcode, data=standat, iter=5000, warmup=500, chains = 1)
fit22
Inference for Stan model: b212ebc67c08c77926c59693aa719288.
1 chains, each with iter=5000; warmup=500; thin=1;
post-warmup draws per chain=4500, total post-warmup draws=4500.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
intercept 10.14 0.30 9.72 -8.42 3.56 10.21 16.71 29.19 1060 1
intercept_x2[1] -10.12 0.30 9.73 -29.09 -16.70 -10.25 -3.50 8.36 1059 1
intercept_x2[2] 189.50 0.30 9.72 170.40 182.98 189.42 196.09 208.05 1063 1
beta 0.05 0.00 0.21 -0.37 -0.10 0.05 0.20 0.47 3114 1
sigma 6.94 0.00 0.15 6.65 6.84 6.94 7.05 7.25 3432 1
sigma_2 43.14 0.09 4.88 34.38 39.71 42.84 46.36 53.26 3248 1
lp__ -2459.75 0.05 1.71 -2463.99 -2460.68 -2459.45 -2458.49 -2457.40 1334 1
Samples were drawn using NUTS(diag_e) at Thu Aug 31 15:53:09 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
My Questions:
- the n_eff is larger using rethinking(). There is simulation differences but do you think something else is going on here?
- Besides the n_eff being different the percentiles of the posterior distributions are different. I was thinking rethinking() and rstan() should return similar results with 5000 iterations since rethinking is just calling rstan. Are differences like that normal or something different between the 2 implementations?
- I created data$GROUP_ID to indicate the categorical groupings. Is this the correct way to incorporate categorical variables into a hierarchical model in rstan()? I have 2 groups and if I had 50 groups I use the same data$GROUP_ID vector but is that the standard way?
Thank you.