I have species-specific disease status for hundreds of species. Some species have 10, and others have 100 individuals sampled, all having disease status of positive or negative (1/0). I want to build phylogenetic models in R, to explore what predicts species-specific disease risk and I'm a bit lost on what to do. I found the following tools:
compar.gee:
library(ape)
model1 <- compar.gee(cbind(positive, negative) ~ pred1 + pred2,
phy=phylogeny, family="binomial", data=data)
- this seems to be very easy to work with, but the documentation is scarce. E.g. I don't see if phylogenetic dependence can be estimated using this model (e.g. similarly how lambda is estimated in the case of gls models).
- Second, I do not understand why are there so few articles using this tool. Is there a reason to avoid it? Are type-I or type-II errors too high? Is the model too rigid to incorporate phylogenetic dependence?
- GLS regression with logit transformed proportions
library(nlme)
model2 <- gls(logit(proportion) ~ pred1 + pred2,
cor=corPagel(1,phylogeny,form=~species), data=data)
- problematic, as logit(0) is undefined and I have taxa where no positive cases were found. I know car::logit(x) has a way around this, but I'm not a big fan of this as the transformed values will change depending on the range of the original values.
MCMCglmm
library(MCMCglmm)
inv.phylo<-inverseA(phylogeny,nodes="TIPS",scale=TRUE)
model3 <- MCMCglmm(cbind(positive, negative) ~ pred1 + pred2,
random=~phylo+species,
family = cbind("binomial"),
ginverse=list(phylo=inv.phylo$Ainv),
prior = list(G=list(G1=list(V=1,nu=0.02),
G2=list(V=1,nu=0.02)),
R=list(V=1,nu=0.02)),
data = data,
nitt = 10000, thin = 5, burnin = 1000)
- Can be problematic due to the difficulty of visualizing the results, time-consuming computations, especially if multiple priors are tested, and potential problems with convergence.
Which tool (OR OTHER) would be best? Which of these should be avoided (if any)?
Thanks a lot for reading this and thinking about the answer!
Cheers,
Jasmine