When doing constrained optimization using the constrOptim
function, I sometimes get the following error message:
Error in optim(theta.old, fun, gradient, control = control, method = method, :
initial value in 'vmmin' is not finite
Example
x <- c(-0.2496881061155757641767394261478330008685588836669921875,
0.0824038146359631351600683046854101121425628662109375,
0.25000000111421105675191256523248739540576934814453125)
nw <- length(x)
ui <- diag(1, nrow = nw)
ui <- rbind(ui, rep(0, nw))
ui[cbind(2:(nw + 1), 1:nw)] <- -1
ci <- rep(-0.8 / (nw + 1), nw + 1)
constrOptim(theta = rep(0, nw), f = function(theta) mean((theta - x)^2),
grad = function(theta) 2 * (theta - x), ui = ui, ci = ci,
method = "BFGS")
What I know
The problem occurs during the iteration inside constrOptim
, when the result comes so close to the boundary that almost all point evaluated by the BFGS optimizer are NaN
s (excluding the initial point). In this case, BFGS will sometimes return an optimal value of NaN and a corresponding minimizing parameter outside the constraint set.
In constrOptim
, the objective function fed to BFGS is given by
R <- function(theta, theta.old, ...) {
ui.theta <- ui %*% theta
gi <- ui.theta - ci
if (any(gi < 0)) {
return(NaN)
}
gi.old <- ui %*% theta.old - ci
bar <- sum(gi.old * log(gi) - ui.theta)
if (!is.finite(bar))
bar <- -Inf
f(theta, ...) - mu * bar
}
My question
It seems to me that the obvious solution to the problem is to simply return sign(mu) * Inf
instead of NaN
if there are any gi < 0
, but could this fix lead to other problems?
After normalizing the gradient properly
I can no longer replicate the problem. It seems that the issue was caused by the wrong weighting of the gradient of the objective function and the gradient of the logarithmic barrier term in the internal gradient.
However, I still think that returning Inf outside the boundary would be more robust than returning NaN.