How to implement Bayesian football model in RSTAN? I can do it in RJAGS but in RSTAN

67 views Asked by At

The original can be found at https://discovery.ucl.ac.uk/id/eprint/16040/1/16040.pdf

We assume the number of goals follow the Poisson distribution

enter image description here

Where the log of goals is computed as enter image description here enter image description here

With the parameters follow these specific distributions enter image description here enter image description here

This is the graphical representation enter image description here

And here is the data

data_set = read.csv('https://www.football-data.co.uk/mmz4281/2223/E0.csv')
data_set = data_set[, c('HomeTeam', 'AwayTeam', 'FTHG', 'FTAG')]

This is my code in rjags which works well:

teams = unique(c(data_set$HomeTeam, data_set$AwayTeam))

data_jags = list(FTHG = data_set$FTHG, FTAG = data_set$FTAG, 
                 HomeTeam = as.numeric(factor(data_set$HomeTeam, levels=teams)),
                 AwayTeam = as.numeric(factor(data_set$AwayTeam, levels=teams)),
                 n_teams = length(teams), n_games = nrow(data_set))

bayes_mod_string = "model {

    for(i in 1:n_games) {
      FTHG[i] ~ dpois(lambda_home[HomeTeam[i],AwayTeam[i]])
      FTAG[i] ~ dpois(lambda_away[HomeTeam[i],AwayTeam[i]])
    }
    
    for(home_i in 1:n_teams) {
      for(away_i in 1:n_teams) {
        lambda_home[home_i, away_i] = exp(home_adv + att[home_i] - def[away_i])
        lambda_away[home_i, away_i] = exp(att[away_i] - def[home_i])
      }
    }
    
    for(j in 1:n_teams) {
      att.star[j] ~ dnorm(mu, tau)
      def.star[j] ~ dnorm(mu, tau)
      att[j] = att.star[j] - mean(att.star[]) 
      def[j] = def.star[j] - mean(def.star[]) 
    }  
    
    home_adv ~ dnorm(0.2, 0.0625)
    mu ~ dnorm(0, 0.0625)
    tau ~ dgamma(0.01, 0.01)
    
}"


bayes_mod = jags.model(textConnection(bayes_mod_string), data = data_jags, n.chains = 3, n.adapt = 1e3)
update(bayes_mod, 1e3)
bayes_sim = coda.samples(bayes_mod, n.iter = 1e3, thin = 100,
                           variable.names = c("home_adv", "att", 'def', "mu", "tau"))
bayes_csim = as.mcmc(do.call(bayes_sim))

And below my code in rstan:

// Data

data {
  int n_games;
  int n_teams;
  int FTHG[n_games];
  int FTAG[n_games];
  int HomeTeam[n_games];
  int AwayTeam[n_games];
}
                  
// Params

parameters {
  
  real<lower=0> home_adv;
  real att;
  real def;
  real mu;
  real tau;
  
}

// Model

model {
  
  for (i in 1:n_games)
    FTHG[i] ~ dpois(lambda_home[HomeTeam[i],AwayTeam[i]]);
    FTAG[i] ~ dpois(lambda_away[HomeTeam[i],AwayTeam[i]]);
    
  for (home_i in 1:n_teams) {
      for(away_i in 1:n_teams) { 
        lambda_home[home_i, away_i] = exp(home_adv + att[home_i] - def[away_i]);
        lambda_away[home_i, away_i] = exp(att[away_i] - def[home_i]);
      }
  }
  for (j in 1:n_teams) {
      att.star[j] ~ dnorm(mu, tau);
      def.star[j] ~ dnorm(mu, tau);
      att[j] = att.star[j] - mean(att.star[]) ;
      def[j] = def.star[j] - mean(def.star[]) ;
    }
    
  home_adv ~ dnorm(0.2, 0.0625);
  mu ~ dnorm(0, 0.0625);
  tau ~ dgamma(0.01, 0.01);
}

Some preparations

teams = unique(c(data_set$HomeTeam, data_set$AwayTeam))

n_games = 380
n_teams = 20
FTHG = data_set$FTHG
FTAG = data_set$FTAG
HomeTeam = as.numeric(factor(data_set$HomeTeam, levels = teams))
AwayTeam = as.numeric(factor(data_set$AwayTeam, levels = teams))

I then save it as draft.stan, run by stan_model('draft.stan') but got this error message:

Error in stanc(file = file, model_code = model_code, model_name = model_name, :
0
Syntax error in 'string', line 35, column 8, lexing error:
Invalid character found.

I think it's because of the sign =? I tried transformed parameters but it didn't work too.

1

There are 1 answers

2
A. S. K. On BEST ANSWER

Part of the problem is that the Stan code posted here includes a lot of JAGS syntax that isn't valid in Stan. (It's possible that this explains all of the problems; I've never done JAGS, so I can't say for sure.) Be sure to read the Stan documentation; Stan syntax is different from JAGS syntax in a lot of ways, so just copying large blocks of JAGS code into Stan is very unlikely to work.

Some of the changes to the original code that were needed include the following:

  • The distribution names in Stan are poisson, normal, etc., not dpois, dnorm, etc. See this page in the documentation.
  • Since there's one att parameter for each team, it has to be declared explicitly as a vector. Similarly for def.
  • You're correct that lambda_home and lambda_away have to be defined in the transformed parameters block. They also have to be explicitly declared.

Here's a complete Stan program that compiles and runs.

data {
  int n_games;
  int n_teams;
  int FTHG[n_games];
  int FTAG[n_games];
  int HomeTeam[n_games];
  int AwayTeam[n_games];
}

parameters {
  
  // Home team advantage.
  real<lower=0> home_adv;
  
  // Skill at attacking.  One parameter per team.
  vector[n_teams] att;
  
  // Skill at defending.  One parameter per team.
  vector[n_teams] def;
  
  // Hyperparameters for team-level parameters.  The code provided in the answer
  // used the same hyperparameters for both attacking and defending, but the
  // description of the model showed separate hyperparameters for each.  I'm
  // going to assume that the description is what was intended and define
  // separate hyperparameters for attacking and defending.  Note that if tau is
  // going to be the sd of a normal distribution, it can't be less than 0.
  real mu_att;
  real mu_def;
  real<lower=0> tau_att;
  real<lower=0> tau_def;
  
}

transformed parameters {
  // For each combination of home team + away team, pre-compute the lambdas for
  // the Poisson distributions.
  matrix[n_teams,n_teams] lambda_home;
  matrix[n_teams,n_teams] lambda_away;
  for(home_i in 1:n_teams) {
    for(away_i in 1:n_teams) {
      lambda_home[home_i,away_i] = exp(home_adv + att[home_i] - def[away_i]);
      lambda_away[home_i,away_i] = exp(att[away_i] - def[home_i]);
    }
  }
}

model {
  
  // Prior on the home team advantage.
  home_adv ~ normal(0.2, 0.0625);
  
  // Priors on hyperparameters.
  mu_att ~ normal(0, 0.0625);
  mu_def ~ normal(0, 0.0625);
  tau_att ~ gamma(0.01, 0.01);
  tau_def ~ gamma(0.01, 0.01);
  
  // Distribution of hyperparameters.
  att ~ normal(mu_att, tau_att);
  def ~ normal(mu_def, tau_def);
  
  // Model of goals.
  for (i in 1:n_games) {
    FTHG[i] ~ poisson(lambda_home[HomeTeam[i],AwayTeam[i]]);
    FTAG[i] ~ poisson(lambda_away[HomeTeam[i],AwayTeam[i]]);
  }
  
}

This code does run, but it's not necessarily what you want. When I ran it on your data, I got a handful of divergent transitions, which suggests that something about the model specification could be improved. I tried switching to a non-centered parameterization (which is possibly what your original JAGS program did; is that what the stars are about?). But, surprisingly, that made things even worse!

At any rate, this program runs, so it should be a good starting point for further exploration.