The original can be found at https://discovery.ucl.ac.uk/id/eprint/16040/1/16040.pdf
We assume the number of goals follow the Poisson distribution
Where the log of goals is computed as
With the parameters follow these specific distributions
This is the graphical representation
And here is the data
data_set = read.csv('https://www.football-data.co.uk/mmz4281/2223/E0.csv')
data_set = data_set[, c('HomeTeam', 'AwayTeam', 'FTHG', 'FTAG')]
This is my code in rjags
which works well:
teams = unique(c(data_set$HomeTeam, data_set$AwayTeam))
data_jags = list(FTHG = data_set$FTHG, FTAG = data_set$FTAG,
HomeTeam = as.numeric(factor(data_set$HomeTeam, levels=teams)),
AwayTeam = as.numeric(factor(data_set$AwayTeam, levels=teams)),
n_teams = length(teams), n_games = nrow(data_set))
bayes_mod_string = "model {
for(i in 1:n_games) {
FTHG[i] ~ dpois(lambda_home[HomeTeam[i],AwayTeam[i]])
FTAG[i] ~ dpois(lambda_away[HomeTeam[i],AwayTeam[i]])
}
for(home_i in 1:n_teams) {
for(away_i in 1:n_teams) {
lambda_home[home_i, away_i] = exp(home_adv + att[home_i] - def[away_i])
lambda_away[home_i, away_i] = exp(att[away_i] - def[home_i])
}
}
for(j in 1:n_teams) {
att.star[j] ~ dnorm(mu, tau)
def.star[j] ~ dnorm(mu, tau)
att[j] = att.star[j] - mean(att.star[])
def[j] = def.star[j] - mean(def.star[])
}
home_adv ~ dnorm(0.2, 0.0625)
mu ~ dnorm(0, 0.0625)
tau ~ dgamma(0.01, 0.01)
}"
bayes_mod = jags.model(textConnection(bayes_mod_string), data = data_jags, n.chains = 3, n.adapt = 1e3)
update(bayes_mod, 1e3)
bayes_sim = coda.samples(bayes_mod, n.iter = 1e3, thin = 100,
variable.names = c("home_adv", "att", 'def', "mu", "tau"))
bayes_csim = as.mcmc(do.call(bayes_sim))
And below my code in rstan
:
// Data
data {
int n_games;
int n_teams;
int FTHG[n_games];
int FTAG[n_games];
int HomeTeam[n_games];
int AwayTeam[n_games];
}
// Params
parameters {
real<lower=0> home_adv;
real att;
real def;
real mu;
real tau;
}
// Model
model {
for (i in 1:n_games)
FTHG[i] ~ dpois(lambda_home[HomeTeam[i],AwayTeam[i]]);
FTAG[i] ~ dpois(lambda_away[HomeTeam[i],AwayTeam[i]]);
for (home_i in 1:n_teams) {
for(away_i in 1:n_teams) {
lambda_home[home_i, away_i] = exp(home_adv + att[home_i] - def[away_i]);
lambda_away[home_i, away_i] = exp(att[away_i] - def[home_i]);
}
}
for (j in 1:n_teams) {
att.star[j] ~ dnorm(mu, tau);
def.star[j] ~ dnorm(mu, tau);
att[j] = att.star[j] - mean(att.star[]) ;
def[j] = def.star[j] - mean(def.star[]) ;
}
home_adv ~ dnorm(0.2, 0.0625);
mu ~ dnorm(0, 0.0625);
tau ~ dgamma(0.01, 0.01);
}
Some preparations
teams = unique(c(data_set$HomeTeam, data_set$AwayTeam))
n_games = 380
n_teams = 20
FTHG = data_set$FTHG
FTAG = data_set$FTAG
HomeTeam = as.numeric(factor(data_set$HomeTeam, levels = teams))
AwayTeam = as.numeric(factor(data_set$AwayTeam, levels = teams))
I then save it as draft.stan
, run by stan_model('draft.stan')
but got this error message:
Error in stanc(file = file, model_code = model_code, model_name = model_name, :
0
Syntax error in 'string', line 35, column 8, lexing error:
Invalid character found.
I think it's because of the sign =
? I tried transformed parameters
but it didn't work too.
Part of the problem is that the Stan code posted here includes a lot of JAGS syntax that isn't valid in Stan. (It's possible that this explains all of the problems; I've never done JAGS, so I can't say for sure.) Be sure to read the Stan documentation; Stan syntax is different from JAGS syntax in a lot of ways, so just copying large blocks of JAGS code into Stan is very unlikely to work.
Some of the changes to the original code that were needed include the following:
poisson
,normal
, etc., notdpois
,dnorm
, etc. See this page in the documentation.att
parameter for each team, it has to be declared explicitly as a vector. Similarly fordef
.lambda_home
andlambda_away
have to be defined in thetransformed parameters
block. They also have to be explicitly declared.Here's a complete Stan program that compiles and runs.
This code does run, but it's not necessarily what you want. When I ran it on your data, I got a handful of divergent transitions, which suggests that something about the model specification could be improved. I tried switching to a non-centered parameterization (which is possibly what your original JAGS program did; is that what the
star
s are about?). But, surprisingly, that made things even worse!At any rate, this program runs, so it should be a good starting point for further exploration.