Issue with constrOptim

1.6k views Asked by At

When doing constrained optimization using the constrOptim function, I sometimes get the following error message:

Error in optim(theta.old, fun, gradient, control = control, method = method,  : 
  initial value in 'vmmin' is not finite

Example

x <- c(-0.2496881061155757641767394261478330008685588836669921875, 
        0.0824038146359631351600683046854101121425628662109375, 
        0.25000000111421105675191256523248739540576934814453125)

nw <- length(x)
ui <- diag(1, nrow = nw)
ui <- rbind(ui, rep(0, nw))
ui[cbind(2:(nw + 1), 1:nw)] <- -1
ci <- rep(-0.8 / (nw + 1), nw + 1)

constrOptim(theta = rep(0, nw), f = function(theta) mean((theta - x)^2),
            grad = function(theta) 2 * (theta - x), ui = ui, ci = ci, 
            method = "BFGS")

What I know

The problem occurs during the iteration inside constrOptim, when the result comes so close to the boundary that almost all point evaluated by the BFGS optimizer are NaNs (excluding the initial point). In this case, BFGS will sometimes return an optimal value of NaN and a corresponding minimizing parameter outside the constraint set.

In constrOptim, the objective function fed to BFGS is given by

R <- function(theta, theta.old, ...) {
  ui.theta <- ui %*% theta
  gi <- ui.theta - ci
  if (any(gi < 0))  {
    return(NaN) 
  }
  gi.old <- ui %*% theta.old - ci
  bar <- sum(gi.old * log(gi) - ui.theta)
  if (!is.finite(bar)) 
    bar <- -Inf
  f(theta, ...) - mu * bar
}

My question

It seems to me that the obvious solution to the problem is to simply return sign(mu) * Inf instead of NaN if there are any gi < 0, but could this fix lead to other problems?

1

There are 1 answers

0
Lars Lau Raket On BEST ANSWER

After normalizing the gradient properly

constrOptim(theta = rep(0, nw), f = function(theta) mean((theta - x)^2),
            grad = function(theta) 2 / nw * (theta - x), ui = ui, ci = ci, 
            method = "BFGS")

I can no longer replicate the problem. It seems that the issue was caused by the wrong weighting of the gradient of the objective function and the gradient of the logarithmic barrier term in the internal gradient.

However, I still think that returning Inf outside the boundary would be more robust than returning NaN.