Why returns subset() in mle2 an error whilst subsetting manually works

136 views Asked by At

Update: for a minimal example scroll down (does not reproduce the same error but different results)

a word of caution before my actual question: I am quite new to R so I hope I make sense. I have tried figuring out the problem using google and the debug functions of R ( although I am still not too familiar with them).

I have read amongst others:

Question on when to use %in% and when == while subsetting

subset vs adressing rows and cols via []

which brought me to the very good blog of Hadley Wickham, where I will further look into the debugging features of R, his article on: Non standard evaluation

My problem

I have a data frame called data containing data on corporate bonds amongst others yield to maturity, life and rating of a bond.

I want to use the mle2 function from the bbmle package ,which is a wrapper for the optim function, to estimate a model for the term structure, see code below. Whether or not the assumption of normally distributed residuals is justified or not is open to question, nevertheless:

Approach 1: "manually" subsetting works just fine:

require("bbmle")

lifeBBB <- data$life[data$Rating == "BBB"]
yieldBBB <- data$yield[data$Rating == "BBB"]



LL <- function(b0, b1, b2, lambda, mu, sigma){ 

Score= yieldBBB - (b0+b1*((1-exp(-lambda*lifeBBB))/(lambda*lifeBBB))+ b2*((1-exp(-lambda*lifeBBB))/(lambda*lifeBBB)-exp(-lambda*lifeBBB)))

    Score = suppressWarnings(dnorm(Score, mu, sigma, log=TRUE))

    -sum(Score)

}

fit <- mle2(LL, start = list(b0 = 108, b1 = -85, b2=168, lambda= 0.12,  mu = -10, sigma=70),control=list(maxit = 5000))'

Call:
mle2(minuslogl = LL, start = list(b0 = 108, b1 = -85, b2 = 168, 
    lambda = 0.12, mu = -10, sigma = 70), control = list(maxit = 5000))

Coefficients:
         b0          b1          b2      lambda          mu 
110.2355968 -82.7010072 167.5960478   0.1410541  -7.7644032 
      sigma 
 69.4846302 

Log-likelihood: -1018.8 

Approach 2: When I try to call subset from within the mle2 function an error occurs:

LL2 <- function(b0, b1, b2, lmbd, mu, sigma){

    Score= yield - (b0+b1*((1-exp(-lmbd*life))/(lmbd*life))+ b2*((1-exp(-lmbd*life))/(lmbd*life)-exp(-lmbd*life)))

    Score = suppressWarnings(dnorm(Score, mu, sigma, log=TRUE)) 

    -sum(Score)    
}


 fit1 <- mle2(minuslogl=LL2, start = list(b0 = 108, b1 = -85, b2=168, lmbd= 0.12,  mu = -10, sigma=70),method="BFGS", data=data, subset= Rating=="BBB", control=list(maxit=5000))

Error in optim(par = c(108, -85, 168, 0.12, -10, 70), fn = function (p)  : 
  initial value in 'vmmin' is not finite

So since approach 1 works and approach 2 should be a more convenient way of doing the exact same thing without having to introduce new variables for every rating, I figured that something in my code has to be wrong. As I said I tried to use the debugging method and ended up somewhere deep in the machine room of R with an entirely different error.

On the error produced I found this from the R Mailing list, the problem here was that fractions instead of integers were provided as the size parameter of a dbinom distribution. But I can't see how this is an issue in my code.

Thanks in advance

B. Lör

On request I tried to come up with a minimal example which uses exemplary data. The minimal example does not produce an error but two different set of estimates. This is peculiar since in my understanding both functions should do the same

#### minmial example

require("bbmle")
## approach 1:

set.seed(23456)

Rating <- c(rep("A",38),rep("BBB",39) )
yield <- rnorm(77,0.5,15)
life <- runif(77,1,20)

exdata <- data.frame(Rating,yield,life)


lifeBBB <- exdata$life[exdata$Rating == "BBB"]
yieldBBB <- exdata$yield[exdata$Rating == "BBB"]

LL <- function(b0, b1, b2, lambda, mu, sigma){

    Score= yieldBBB - (b0+b1*((1-exp(-lambda*lifeBBB))/(lambda*lifeBBB))+ b2*((1-exp(-lambda*lifeBBB))/(lambda*lifeBBB)-exp(-lambda*lifeBBB)))

    Score = suppressWarnings(dnorm(Score, mu, sigma, log=TRUE))

    -sum(Score)

}

fit <- mle2(LL, start = list(b0 = 100, b1 = -80, b2=160, lambda= 0.12,  mu = 1, sigma=14),method ="BFGS",control=list(maxit = 5000)) 


### approach 2:


LL2 <- function(b0, b1, b2, lmbd, mu, sigma){

    Score= exdata$yield - (b0+b1*((1-exp(-lmbd*exdata$life))/(lmbd*exdata$life))+ b2*((1-exp(-lmbd*exdata$life))/(lmbd*exdata$life)-
                                                                                          exp(-lmbd*exdata$life)))

    Score = suppressWarnings(dnorm(Score, mu, sigma, log=TRUE)) ## assumption is that residuals are normally distributed

    -sum(Score)    
}


fit1 <- mle2(minuslogl=LL2, start = list(b0 = 100, b1 = -80, b2=160, lmbd= 0.12,  mu = 1, sigma=14),method="BFGS", data=exdata, subset= Rating=="BBB", control=list(maxit=5000))

Call(fit):

Call:
mle2(minuslogl = LL, start = list(b0 = 100, b1 = -80, b2 = 160, 
    lambda = 0.12, mu = 1, sigma = 14), method = "BFGS", control = list(maxit = 5000))

Coefficients:
          b0           b1           b2       lambda           mu        sigma 
 94.34166416 -85.35582080 159.76952349  -0.00283408  -4.65833584  12.94283927 

Log-likelihood: -155.2 



Call (fit1):
all:
mle2(minuslogl = LL2, start = list(b0 = 100, b1 = -80, b2 = 160, 
    lmbd = 0.12, mu = 1, sigma = 14), method = "BFGS", data = exdata, 
    subset = Rating == "BBB", control = list(maxit = 5000))

Coefficients:
           b0            b1            b2          lmbd            mu         sigma 
 9.269496e+01 -8.600709e+01  1.586543e+02 -1.186371e-04 -6.305040e+00  1.458118e+01 

Log-likelihood: -315.6 
0

There are 0 answers