Error when trying to run fixed effects logistic regression

1.8k views Asked by At

not sure where can I get help, since this exact post was considered off-topic on StackExchange.

I want to run some regressions based on a balanced panel with electoral data from Brazil focusing on 2 time periods. I want to understand if after a change in legislation that prohibited firm donations to candidates, those individuals that depended most on these resources had a lower probability of getting elected.

I have already ran a regression like this on R:

model_continuous <- plm(percentage_of_votes ~ time + 
                        treatment + time*treatment, data = dataset, model = 'fd')

On this model I have used a continuous variable (% of votes) as my dependent variable. My treatment units or those that in time = 0 had no campaign contributions coming from corporations.

Now I want to change my dependent variable so that it is a binary variable indicating if the candidate was elected on that year. All of my units were elected on time = 0. How can I estimate a logit or probit model using fixed effects? I have tried using the pglm package in R.

model_binary <- pglm(dummy_elected ~ time + treatment + time*treatment, 
                           data = dataset, 
                           effects = 'twoways',
                           model = 'within',
                           family = 'binomial',
                           start = NULL)

However, I got this error:

Error in maxRoutine(fn = logLik, grad = grad, hess = hess, start = start,  : 
  argument "start" is missing, with no default

Why is that happening? What is wrong with my model? Is it conceptually correct? I want the second regression to be as similar as possible to the first one.

I have read that clogit function from the survival package could do the job, but I dont know how to do it.

Edit:

this is what a sample dataset could look like:

dataset <- data.frame(individual = c(1,1,2,2,3,3,4,4,5,5),
                      time = c(0,1,0,1,0,1,0,1,0,1),
                      treatment = c(0,0,1,1,0,0,1,1,0,0),
                      corporate = c(0,0,0.1,0,0,0,0.5,0,0,0))
1

There are 1 answers

5
dmb On

Based on the comments, I believe the logistic regression reduces to treatment and dummy_elected. Accordingly I have fabricated the following dataset:

dataset <- data.frame("treatment" = c(rep(1,1000),rep(0,1000)),
         "dummy_elected" = c(rep(1, 700), rep(0, 300), rep(1, 500), rep(0, 500)))

I then ran the GLM model:

library(MASS)
model_binary <- glm(dummy_elected ~ treatment, family = binomial(), data = dataset)
summary(model_binary)

Note that the treatment coefficient is significant and the coefficients are given. The resulting probabilities are thus

Probability(dummy_elected) = 1 =>  1 / (1 + Exp(-(1.37674342264577E-16 + 0.847297860386033 * :treatment)))
Probability(dummy_elected) = 0 => 1 - 1 / (1 + Exp(-(1.37674342264577E-16 + 0.847297860386033 * :treatment)))

Note that these probabilities are consistent with the frequencies I generated the data.

So for each row, take the max probability across the two equations above and that's the value for dummy_elected.