Good afternoon, I am trying to run a hurdle model with two dependent variables: probability of disease persistence (0 or 1; the 'hurdle') and disease prevalence rate given there was disease persistence. My response variable is number of infected individuals, which I offset by total population, instead of prevalence rate because negative binomial distributions require count data.
I am assessing two covariates: removal rate and land access rate. There are 7 levels for each variable and the levels are the same for both: 10, 20, 30, 40, 50, 70, and 100. I do not have scenarios where both rates are 100% because this would cause population and disease extinction in the model. Therefore, I have 48 combinations of rates (7x7=49-1=48 scenarios) and 100 repetitions per scenario (n=4800). For example:
hurdle_data <- data.frame(Remove_Rate = c(10,20,30,40,50,70),
Land_Access= c(20,30,40,50,70,100),
Persistence= c(1,1,1,0,0,0),
Num_Infected= c(77,71,74,0,0,0),
Total_pop= c(200,198,187,191,170,165))
With the model:
library(pscl)
hurdle(formula = Num_Infected ~ Land_Access * Remove_Rate * offset(log(Total_pop)) | Land_Access * Remove_Rate, data = hurdle_data, dist = "negbin")
I am getting the error: Warning message: In sqrt(diag(object$vcov)) : NaNs produced. I believe it is because I do not have the 100x100 scenario to complete the 7 by 7 matrix of rate combinations. See the results here:
Count model coefficients (truncated negbin with log link):
Estimate Std. Error
(Intercept) -0.9756389 0.0230297
Land_Access -0.0002008 NaN
Remove_Rate -0.0005532 NaN
Land_Access:Remove_Rate -0.0002561 NaN
Log(theta) 0.4061206 0.0058463
z value Pr(>|z|)
(Intercept) -42.36 <2e-16 ***
Land_Access NaN NaN
Remove_Rate NaN NaN
Land_Access:Remove_Rate NaN NaN
Log(theta) 69.47 <2e-16 ***
Zero hurdle model coefficients (binomial with logit link):
Estimate Std. Error
(Intercept) 1.0087696 0.0596136
Land_Access 0.0027399 NaN
Remove_Rate 0.0000686 NaN
Land_Access:Remove_Rate -0.0003203 NaN
z value Pr(>|z|)
(Intercept) 16.92 <2e-16 ***
Land_Access NaN NaN
Remove_Rate NaN NaN
Land_Access:Remove_Rate NaN NaN
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Theta: count = 1.501
Number of iterations in BFGS optimization: 12
Log-likelihood: -1.965e+04 on 9 Df
I would like to use this output as a prior for a Bayesian conditional model, and these issues are affecting that model's output so I cannot just ignore this (as I saw others have suggested with this issue).
I should mention that this model worked great with a similar set up but three categorical covariates instead of the above two. For example:
hurdle_data <- data.frame(Study_Area = c('Urban','Urban','Urban','Rural','Rural','Rural'),
Density= c(1,2,3,1,2,3),
Removal_Method=c('Ring','Parcel','Habitat','Ring','Parcel','Habitat')
Persistence= c(1,1,1,0,0,0)
Num_Infected= c(77,71,74,0,0,0),
Total_pop= c(200,198,187,191,170,165))
Any suggestions on how to get around this issue would be really appreciated. Thank you.