Hurdle model: In sqrt(diag(object$vcov)) : NaNs produced with no data for some levels

153 views Asked by At

Good afternoon, I am trying to run a hurdle model with two dependent variables: probability of disease persistence (0 or 1; the 'hurdle') and disease prevalence rate given there was disease persistence. My response variable is number of infected individuals, which I offset by total population, instead of prevalence rate because negative binomial distributions require count data.

I am assessing two covariates: removal rate and land access rate. There are 7 levels for each variable and the levels are the same for both: 10, 20, 30, 40, 50, 70, and 100. I do not have scenarios where both rates are 100% because this would cause population and disease extinction in the model. Therefore, I have 48 combinations of rates (7x7=49-1=48 scenarios) and 100 repetitions per scenario (n=4800). For example:

hurdle_data <- data.frame(Remove_Rate = c(10,20,30,40,50,70),
   Land_Access= c(20,30,40,50,70,100),
   Persistence= c(1,1,1,0,0,0),
   Num_Infected= c(77,71,74,0,0,0),
   Total_pop= c(200,198,187,191,170,165))

With the model:

library(pscl)
hurdle(formula = Num_Infected ~ Land_Access * Remove_Rate * offset(log(Total_pop)) | Land_Access * Remove_Rate, data = hurdle_data, dist = "negbin")

I am getting the error: Warning message: In sqrt(diag(object$vcov)) : NaNs produced. I believe it is because I do not have the 100x100 scenario to complete the 7 by 7 matrix of rate combinations. See the results here:

Count model coefficients (truncated negbin with log link):
                          Estimate Std. Error
(Intercept)             -0.9756389  0.0230297
Land_Access             -0.0002008        NaN
Remove_Rate             -0.0005532        NaN
Land_Access:Remove_Rate -0.0002561        NaN
Log(theta)               0.4061206  0.0058463
                        z value Pr(>|z|)    
(Intercept)              -42.36   <2e-16 ***
Land_Access                 NaN      NaN    
Remove_Rate                 NaN      NaN    
Land_Access:Remove_Rate     NaN      NaN    
Log(theta)                69.47   <2e-16 ***
Zero hurdle model coefficients (binomial with logit link):
                          Estimate Std. Error
(Intercept)              1.0087696  0.0596136
Land_Access              0.0027399        NaN
Remove_Rate              0.0000686        NaN
Land_Access:Remove_Rate -0.0003203        NaN
                        z value Pr(>|z|)    
(Intercept)               16.92   <2e-16 ***
Land_Access                 NaN      NaN    
Remove_Rate                 NaN      NaN    
Land_Access:Remove_Rate     NaN      NaN    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Theta: count = 1.501
Number of iterations in BFGS optimization: 12 
Log-likelihood: -1.965e+04 on 9 Df

I would like to use this output as a prior for a Bayesian conditional model, and these issues are affecting that model's output so I cannot just ignore this (as I saw others have suggested with this issue).

I should mention that this model worked great with a similar set up but three categorical covariates instead of the above two. For example:

hurdle_data <- data.frame(Study_Area = c('Urban','Urban','Urban','Rural','Rural','Rural'),
Density= c(1,2,3,1,2,3),
Removal_Method=c('Ring','Parcel','Habitat','Ring','Parcel','Habitat')
Persistence= c(1,1,1,0,0,0)
Num_Infected= c(77,71,74,0,0,0),
Total_pop= c(200,198,187,191,170,165))

Any suggestions on how to get around this issue would be really appreciated. Thank you.

0

There are 0 answers