Fitting zero-truncated negative binomial to some datasets with VGLM fails

248 views Asked by At

I've been trying to fit a zero truncated negative binomial to some count data. In some cases it works beautifully, for example:

library(ggplot2)
library(VGAM)
df <- data.frame(n = 1:7, freq = c(0.32, 0.286, 0.19, 0.10, 0.05, 0.02, 0.01))
fit = vglm(df$n ~ 1, posnegbinomial, weights=df$freq)
ggplot(df, aes(x = n, y = freq)) +
  geom_point(colour= "red") +
  geom_point(aes(x = n, 
                 y = dnbinom(n, size=Coef(fit)[2], 
                             mu = Coef(fit)[1])/(1-dnbinom(0, size=Coef(fit)[2], 
                                                           mu = Coef(fit)[1]))),
                 colour = "blue")

1

However, in other cases it fails:

df2 <- data.frame(n = 1:4, freq = c(0.87, 0.11, 0.01, 0.0005))
fit2 = vglm(df2$n ~ 1, posnegbinomial, weights=df2$freq)
ggplot(df2, aes(x = n, y = freq)) +
  geom_point(colour= "red") +
  geom_point(aes(x = n, 
                 y = dnbinom(n, size=Coef(fit2)[2], 
                             mu = Coef(fit2)[1])/(1-dnbinom(0, size=Coef(fit2)[2], 
                                                           mu = Coef(fit2)[1]))),
                 colour = "blue")

enter image description here

In such cases a Poisson fit works fine:

fit_p = vglm(df2$n ~ 1, pospoisson, weights=df2$freq)

ggplot(df2, aes(x = n, y = freq)) +
  geom_point(colour= "red") +
  geom_point(aes(x = n, 
                 y = dpois(n, Coef(fit_p)[1])/(1-dpois(0, Coef(fit_p)[1]))),
             colour = "blue")

enter image description here

Is there any way I can fix this?

Thanks!

1

There are 1 answers

3
jay.sf On

Are you sure you haven't mixed up something in your second model? I'm getting a much better fit. To avoid mistakes in repeating code, you may want to use a function for y, as I show you below:

library(VGAM)
fit1 <- vglm(df1$n ~ 1, posnegbinomial, weights=df1$freq)
fit2 <- vglm(df2$n ~ 1, posnegbinomial, weights=df2$freq)

## helper fun to get y
gy <- function(x) {
  cf <- Coef(x)
  dnbinom(seq(nrow(x@residuals)), size=cf[2], mu=cf[1]) / 
    (1-dnbinom(0, size=cf[2], mu=cf[1]))
}

## plot
op <- par(mfrow=c(1, 2))

plot(df1$n, df1$freq, main="fit1")
points(gy(fit1), col=2)

plot(df2$n, df2$freq, main="fit2")
points(gy(fit2), col=2)

par(op)

enter image description here


Data:

df1 <- data.frame(n=1:7, freq=c(0.32, 0.286, 0.19, 0.10, 0.05, 0.02, 0.01))
df2 <- data.frame(n=1:4, freq=c(0.87, 0.11, 0.01, 0.0005))