gamma distribution lower limit in R

1.3k views Asked by At

I would like to fit a gamma distribution to a dataset made of 338 elements, with a fixed low threshold (I'm using R). To represent the low limit I thought to use a gamma with three parameters, adding the location. Here's my code:

library(FAdist)
library(fitdistrplus)
fit <- fitdist(mydata,"gamma3",start=list(1,1,mythreshold))

Everytime i run the code, I got the same error:

<simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data,     gr = gradient, ddistnam = ddistname, hessian = TRUE, method = meth,     lower = lower, upper = upper, ...): non-finite finite-difference value [3]>
Error in fitdist(Hs, "gamma3", start = list(1, 1, 3)) : 
  the function mle failed to estimate the parameters, 
                with the error code 100

So I tried to change the starting values, using the ones obtained by the fit of the gamma with two parameters, which gave this result:

fit <- fitdist(mydata,"gamma")
fit
Fitting of the distribution ' gamma ' by maximum likelihood 
Parameters:
       estimate Std. Error
shape 21.417503  1.6348313
rate   5.352422  0.4133735

But it still doesn't work..I would understand it if even the gamma with two parameters didn't work, but this is not the case and I can't give myself an explanation. Moreover, the QQ-plot and the ecdf for the gamma with two parameters are not so good...but if i fit the distribution on the original dataset scaled with respect to the low threshold the fit looks perfect:

fit <- fitdist(mydata-mythreshold,"gamma")
fit
Fitting of the distribution ' gamma ' by maximum likelihood 
Parameters:
      estimate Std. Error
shape 1.059540 0.07212832
rate  1.058007 0.09117620

But then I don't know if it is correct to do like that...the parameters are very different and I need them to calculate the return periods related to my data ! that's why I thought about the gamma with the location parameter.

p.s. i did not attach the data because they are too many, but i can report the summary of them:

 summary(mydata)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  3.003   3.282   3.753   4.001   4.444   8.087 
1

There are 1 answers

1
Ben Bolker On BEST ANSWER

Set up a reproducible example;

library(FAdist); library(fitdistrplus)
set.seed(101)
x <- rgamma3(1000,shape=1,scale=1,thres=1)

For values of the threshold parameter greater than the minimum value in the data set the likelihood is infinite (since these values are supposed to be impossible/have a value of zero under the truncated distribution). We can make things work by setting an upper bound for this parameter:

fitdist(x,dgamma3,start=list(shape=1,scale=1,thres=0.5),
        upper=c(Inf,Inf,min(x)))
## Fitting of the distribution ' gamma3 ' by maximum likelihood 
## Parameters:
##        estimate Std. Error
## shape 0.9321949         NA
## scale 1.0586079         NA
## thres 1.0000012         NA

Note (1) under maximum likelihood this kind of threshold parameter will always end up at the largest value it can take (i.e., the minimum value in the data set); (2) the standard errors are NA because Wald standard errors can't be computed when the parameters hit a boundary.

Alternatively you can fix the threshold parameter by defining wrapper function:

dgconstr <- function(x,shape,scale,...) {
    dgamma3(x,shape,scale,thres=0.5,...)
}
pgconstr <- function(...) {
    pgamma3(...,thres=0.5)
}

fitdist(x,dgconstr,start=list(shape=1,scale=1))