I wanted to know how to solve the problem of perfect multicollinearity in a glm that I fit in R I wanna to see if the morphological measures can predict the a bird's arrival day in territory, so I have tarsus, wing and tail, I'm also want to see the difference in males and females.
So, I'm using the code:
myggod <- glm(day_territory ~ sex * (Right_tarsus + Right_wing +
Tail_length), data = territory, family = "poisson")
that show the follow output:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 19.626581 17.831173 1.101 0.27103
sexfemale -14.645707 17.852832 -0.820 0.41201
sexmale -12.343274 17.835662 -0.692 0.48890
Right_tarsus -0.920874 1.233841 -0.746 0.45546
Right_wing -0.007466 0.016571 -0.451 0.65233
Tail_length -0.043216 0.013195 -3.275 0.00106 **
sexfemale:Right_tarsus 0.883152 1.234115 0.716 0.47423
sexmale:Right_tarsus 0.846497 1.233209 0.686 0.49245
sexfemale:Right_wing 0.018863 0.020855 0.904 0.36574
sexmale:Right_wing NA NA NA NA
sexfemale:Tail_length 0.021428 0.015584 1.375 0.16911
sexmale:Tail_length NA NA NA NA
So, I have perfect multicollinearity to male's tail and wing
I already tried use scale and center = true, use the measures minus the mean, use log and use a PC1 made of an PCA using wing and tail
nothing worked, i have the same issue with all of these methods, even when both measures are just the PC1 the same NAs appears ...
So, how can I solve it?
We can eliminate the overparameterization problem by removing the interaction effects from the model.
...and the output:
The AIC on the overparameterized model is 1087.8, so the model with fewer parameters is slightly better than the overparameterized one.
Note that almost half the observations in the data frame were deleted from the analysis due to missing values. You'll need to review the missing data and make some decisions about strategies for interpolating missing data, or collect more data to assess whether the
sexvariable is meaningful.Also, the dependent variable in a poisson model is typically a count, but from the original question it's hard to understand why a poisson model is being used here. That is, if the variables
Right_tarsusTight_wingandTail_lengthare size measurements of birds, why would size measurements predict counts?If the dependent variable is the day of arrival in a specific location, a poisson model probably isn't the right model.