I am looking for a way to score model fits of x,y data on a scale from 0 to 1, depending on which model the data best fit. If data fit a linear model best, the score would come out close to 1. Log would be close to 0, and quadratic would be intermediate (0.5, for example).
My problem is that the function below does not differentiate between linear and quadratic models - any ideas on how to fix this? I might be overlooking something simple or going about this the wrong way. I am using R. Sample code and function below;
`# Functions
sat <- function(x, alpha) (1-alpha)*x + alpha*log(x)
lik.sat <- function(alpha, x, y) as.numeric(logLik(lm(y ~ sat(x,alpha))))
est.sat <- function(x, y){
fit <- optim(par=.5, fn=lik.sat, method="Brent", control=list(fnscale=-1), lower=0, upper=1, x=x, y=y)
if(fit$convergence==0)
return(fit$par)
return(NA)
}
# Simulate data
#Linear model
x<- abs(rnorm(100))
lin.y <- x + rnorm(100, sd=.25)
#quad model
quad.y <- x + x^2 + rnorm(100, sd=.25)
#log model
log.y <- log(x) + rnorm(100, sd=.25)
# Fit data
lik.sat(0, x, lin.y)
lik.sat(1, x, lin.y)
lik.sat(0, x, quad.y)
lik.sat(1, x, quad.y)
lik.sat(0, x, log.y)
lik.sat(1, x, log.y)
# Estimate transformation
est.sat(x, lin.y) ## close to 1
est.sat(x, quad.y) #### DOESN'T WORK!
est.sat(x, log.y) ## close to 0`
This seems problematic because quadratic has 3 coefficients and the others have 2. Instead of trying to to use a continuous parameter just test each of the 3 models to see which gives the lowest AIC. In each of the three examples below it chooses the correct form.
Note that this can be made automatic and is superior to using nls since lm is much more stable.
We have returned the name of the chosen model but it could readily be modified to return the formula or the lm object depending on what is wanted. In those cases call it using lapply and Map instead of sapply and mapply.
Another example. This one uses the built-in anscombe data set.