not sure where can I get help, since this exact post was considered off-topic on StackExchange.
I want to run some regressions based on a balanced panel with electoral data from Brazil focusing on 2 time periods. I want to understand if after a change in legislation that prohibited firm donations to candidates, those individuals that depended most on these resources had a lower probability of getting elected.
I have already ran a regression like this on R:
model_continuous <- plm(percentage_of_votes ~ time +
treatment + time*treatment, data = dataset, model = 'fd')
On this model I have used a continuous variable (% of votes) as my dependent variable. My treatment
units or those that in time = 0
had no campaign contributions coming from corporations.
Now I want to change my dependent variable so that it is a binary variable indicating if the candidate was elected on that year. All of my units were elected on time = 0
. How can I estimate a logit
or probit
model using fixed effects? I have tried using the pglm
package in R.
model_binary <- pglm(dummy_elected ~ time + treatment + time*treatment,
data = dataset,
effects = 'twoways',
model = 'within',
family = 'binomial',
start = NULL)
However, I got this error:
Error in maxRoutine(fn = logLik, grad = grad, hess = hess, start = start, :
argument "start" is missing, with no default
Why is that happening? What is wrong with my model? Is it conceptually correct? I want the second regression to be as similar as possible to the first one.
I have read that clogit
function from the survival
package could do the job, but I dont know how to do it.
Edit:
this is what a sample dataset could look like:
dataset <- data.frame(individual = c(1,1,2,2,3,3,4,4,5,5),
time = c(0,1,0,1,0,1,0,1,0,1),
treatment = c(0,0,1,1,0,0,1,1,0,0),
corporate = c(0,0,0.1,0,0,0,0.5,0,0,0))
Based on the comments, I believe the logistic regression reduces to treatment and dummy_elected. Accordingly I have fabricated the following dataset:
I then ran the GLM model:
Note that the treatment coefficient is significant and the coefficients are given. The resulting probabilities are thus
Note that these probabilities are consistent with the frequencies I generated the data.
So for each row, take the max probability across the two equations above and that's the value for dummy_elected.