I'm new to Stan and probabilistic programming. I'm trying to construct a non-linear growth model. I've been able to construct the model in NLS
The NLS formula I used is: Trump_Pct ~ alpha - beta * lambda^Population
My NLS summary is:
Parameters:
Estimate Std. Error t value Pr(>|t|)
alpha 5.627e+01 2.053e+00 27.41 <2e-16 ***
beta 3.018e+01 1.974e+00 15.29 <2e-16 ***
lambda 9.981e-01 2.486e-04 4014.47 <2e-16 ***
In other words, a basic exponential decay curve. I'm trying to replicate with Stan.
My data is as follows:
I have N
observations in the dataset: The predictor is the population of a county ("Population") and the predicted Y is the percent of vote to Trump "Trump_Pct".
I have tried two ways of constructing this model.
In one, I pass in each component to the data to the model as a vector.
In the other, I leave each data component as a list and attempt to use each data point.
I'm not able in either case to get the model to run successfully.
Here are my models:
Case 1:
This is an adaptation of this model.
Here I've created vectorized versions of the columns Trump_Pct and Population.
data {
int N;
vector[N] PopulationV;
vector[N] Trump_PctV;
}
parameters {
vector [1] alpha;
vector [1] beta;
vector [1] lambda;
real<lower=0> sigma;
}
model {
vector[N] ypred;
ypred = alpha[1] - beta[1] * (lambda[1]^PopulationV);
Trump_PctV ~ ypred + sigma;
}
This model fails at the line with the exponent for the following reason:
`SYNTAX ERROR, MESSAGE(S) FROM PARSER:
arguments to ^ must be primitive (real or int); cannot exponentiate real by vector in block=local`
I've tried using pow()
but can't find a way forward. Any tips?
Case 2:
data {
int<lower=0> N;
real <lower=0> Population[N];
real <lower=0> Trump_Pct[N];
}
parameters {
real alpha;
real beta;
real<lower=3,upper= 4> lambda;
real<lower=0> tau;
}
transformed parameters {
real sigma;
sigma = 1 / sqrt(tau);
}
model {
real m[N];
for (i in 1:N)
m[i] = alpha - beta * pow(lambda, Population[i]);
Trump_Pct ~ normal(m, sigma);
alpha ~ normal(10, 20);
beta ~ normal(5, 10);
lambda ~ uniform(3, 4);
tau ~ gamma(.0001, .0001);
}
In case 2, I am not able to keep the parameter estimates within bounds:
"Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:"
[2] "Exception thrown at line 21: normal_log: Location parameter[2873] is -inf, but must be finite!"
Can anyone offer an advice for a simple non-linear model for my formula?
Your case 2 is the correct syntax. As you discovered, neither
^
norpow
input vectors, so you have to loop over them.The informational message you see is due to numerical overflow, and should not cause the sampler to stop. There is more detail about that message here.
It is possible that the sampler cannot get going, in which case you can pass the
init_r
value tostan
orsampling
and setinit_r
to a value less than its default of 2. This affects the width of the uniform interval from which initial values are drawn in the unconstrained space.If there are many overflow messages, it is quite possible that you have other problems as well, such as divergent transitions that are also covered at the above link. The ultimate solution probably involves rescaling the data, reparameterizing the model, and / or tightening the priors.