I have some actual data that I am afraid is somewhat nasty.

It's essentially a Positive Negative Binomial distribution (without any zero counts). However, there are some outliers that seem to cause some bad calculations to occur (maybe underflow or NaNs?) The first 8 or so entries are reasonable, but I'm guessing the last few are causing some problems with the fitting.

Here's the data:

> df
   counts  t
1    1968  1
2     217  2
3      55  3
4      26  4
5      11  5
6       5  6
7       8  7
8       3  8
9       1 10
10      1 11
11      1 12
12      1 13
13      1 15
14      1 18
15      1 26
16      1 59

This command runs for a while and then spits out the error message

> vglm(counts ~ t, data=df, family = posnegbinomial)
Error in if (take.half.step) { : missing value where TRUE/FALSE needed

BUT, if I rerun this cutting off the outliers, I get a solution for posnegbinomial

> vglm(counts ~ t, data=df[1:9,], family = posnegbinomial)
Call:
vglm(formula = counts ~ t, family = posnegbinomial, data = df[1:9,])

Coefficients:
(Intercept):1 (Intercept):2             t 
    7.7487404     0.7983811    -0.9427189 

Degrees of Freedom: 18 Total; 15 Residual
Log-likelihood: -36.21064 

If I try the family pospoisson (Positive Poisson: no zero values), I get a similar error "argument is not interpretable as logical".

I do notice that there are a number of similar questions in Stackoverflow about missing values where TRUE/FALSE is needed, but with other R packages. This indicates to me that perhaps the package writers need to better anticipate calculations might fail.

1

There are 1 answers

0
Ben Bolker On

I think your proximal problem is that the predicted means for the negative binomial for your extreme values are so close to zero that they are underflowing to zero, in a way that was not anticipated/protected against by the package authors. (One thing to realize about nonlinear optimization/fitting is that it is always possible to break a fitting method by giving it extreme data ...)

I couldn't get this to work in VGAM, but I'll offer a couple of other suggestions.

plot(log(counts)~t,data=dd)

And eyeballing the data to get an initial estimate of parameter values (at least for the mean model):

m0 <- lm(log(counts)~t,data=subset(dd,t<10))

I thought I might be able to get vglm() to work by setting starting values, but that didn't actually pan out, even when I have fairly good values from other platforms (see below).

glmmADMB

The glmmADMB package can handle positive NB, via family="truncnbinom":

library(glmmADMB)
m1 <- glmmadmb(counts~t, data=dd, family="truncnbinom")

(there are some warning messages ...)

bbmle::mle2()

This requires a little bit more work: it failed with the standard model, but works if I set a floor on the predicted mean ...

library(VGAM)  ## for dposnegbin
library(bbmle)
m2 <- mle2(counts~dposnegbin(size=exp(logk),
                         munb=pmax(exp(logeta),1e-7)),
           parameters=list(logeta~t),
           data=dd,
           start=list(logk=0,logeta=0))

Again warning messages.

Compare glmmADMB, mle2, simple truncated lm fit ...

cc <- cbind(coef(m2),
  c(log(m1$alpha),coef(m1)),
  c(NA,coef(m0)))
dimnames(cc) <- list(c("log_k","log_int","slope"),

                 c("mle2","glmmADMB","lm"))

##               mle2   glmmADMB         lm
## log_k    0.8094678  0.8094625         NA
## log_int  7.7670604  7.7670637  7.1747551
## slope   -0.9491796 -0.9491778 -0.8328487

This is in principle also possible with glmmTMB, but it runs into the same kinds of problems as vglm() ...