I am trying to run a Bayesian clustering model where the number of clusters is random with binomial distribution. This is my Jags model:
model{
for(i in 1:n){
y[ i ,1:M] ~ dmnorm( mu[z[i] , 1:M] , I[1:M, 1:M])
z[i] ~ dcat(omega[1:M])
}
for(j in 1:M){
mu[j,1:M] ~ dmnorm( mu_input[j,1:M] , I[1:M, 1:M] )
}
M ~ dbin(p, Mmax)
omega ~ ddirich(rep(1,Mmax))
}
to run it, we need to define the parameters anche the initial values for the variables, which is done in this R script
Mmax=10
y = matrix(0,100,Mmax)
I = diag(Mmax)
y[1:50,] = mvrnorm(50, rep(0,Mmax), I)
y[51:100,] = mvrnorm(50, rep(5,Mmax), I)
plot(y[,1:2])
z = 1*((1:100)>50) + 1
n = dim(y)[1]
M=2
mu=matrix(rnorm(Mmax^2),nrow=Mmax)
mu_input=matrix(2.5,Mmax,Mmax) ### prior mean
p=0.5
omega=rep(1,Mmax)/Mmax
data = list(y = y, I = I, n = n, mu_input=mu_input, Mmax = Mmax, p = p)
inits = function() {list(mu=mu,
M=M,
omega = omega) }
require(rjags)
modelRegress=jags.model("cluster_variabile.txt",data=data,inits=inits,n.adapt=1000,n.chains=1)
however, running the last command, one gets
Error in jags.model("cluster_variabile.txt", data = data, inits = inits,
: RUNTIME ERROR: Compilation error on line 6.
Unknown variable M Either supply values
for this variable with the data or define it on the left hand side of a relation.
which for me makes no sense, since the error is at line 6 even if M already appears at line 4 of the model! What is the actual problem in running this script?
So JAGS is not like R or other programming procedural languages in that it doesn't actually run line by line, it is a declarative language meaning the order of commands doesn't actually matter at least in terms of how the errors pop up. So just because it didn't throw an error on line 4 doesn't mean something isn't also wrong there. Im not positive, but I believe the error is occuring because JAGS tries to build the array first before inputting values, so M is not actually defined at this stage, but nothing you can do about that on your end.
With that aside, there should be a fairly easy work around for this, it is just less efficient. Instead of looping from
1:M
make the loop iterate from1:MMax
that way the dimensions don't actually change, it is always an MMax x MMax. Then line 7 just assigns 1:M of those positions to a value. The downside of this is that it will require you to do some processing after the model is fit. So on each iteration, you will need to pull the sampled M and filter the matrix mu to be M x M, but that shouldn't be too tough. Let me know if you need more help.