Score models based on log, quadratic, or linear fit on a 0-1 scale

Question

Score models based on log, quadratic, or linear fit on a 0-1 scale

72 views Asked by Emma At 19 May 2023 at 08:54

I am looking for a way to score model fits of x,y data on a scale from 0 to 1, depending on which model the data best fit. If data fit a linear model best, the score would come out close to 1. Log would be close to 0, and quadratic would be intermediate (0.5, for example).

My problem is that the function below does not differentiate between linear and quadratic models - any ideas on how to fix this? I might be overlooking something simple or going about this the wrong way. I am using R. Sample code and function below;

`# Functions
sat <- function(x, alpha) (1-alpha)*x + alpha*log(x)

lik.sat <- function(alpha, x, y) as.numeric(logLik(lm(y ~ sat(x,alpha))))

est.sat <- function(x, y){
  fit <- optim(par=.5, fn=lik.sat, method="Brent", control=list(fnscale=-1), lower=0, upper=1, x=x,   y=y)
  if(fit$convergence==0)
    return(fit$par)
  return(NA)
}

# Simulate data
#Linear model
x<- abs(rnorm(100))
lin.y <- x + rnorm(100, sd=.25)

#quad model
quad.y <- x + x^2 +  rnorm(100, sd=.25)

#log model
log.y <- log(x) + rnorm(100, sd=.25)

# Fit data
lik.sat(0, x, lin.y)
lik.sat(1, x, lin.y)
lik.sat(0, x, quad.y)   
lik.sat(1, x, quad.y)
lik.sat(0, x, log.y)
lik.sat(1, x, log.y)

# Estimate transformation
est.sat(x, lin.y)  ## close to 1
est.sat(x, quad.y)  #### DOESN'T WORK!
est.sat(x, log.y)  ## close to 0`

Original Q&A

There are 1 answers

**G. Grothendieck** · Answer 1 · 2023-05-20T01:39:22+00:00

This seems problematic because quadratic has 3 coefficients and the others have 2. Instead of trying to to use a continuous parameter just test each of the 3 models to see which gives the lowest AIC. In each of the three examples below it chooses the correct form.

set.seed(123)
x<- abs(rnorm(100))
lin.y <- x + rnorm(100, sd=.25)
quad.y <- x + x^2 +  rnorm(100, sd=.25)
log.y <- log(x) + rnorm(100, sd=.25)
 
fos <- list(log = log.y ~ log(x), lin = log.y ~ x, quad = log.y ~ poly(x, 2))
sapply(fos, function(fo) AIC(lm(fo))) # log has lowest AIC
##       log       lin      quad 
##  17.15268 203.86473 136.72457 

fos <- list(log = lin.y ~ log(x), lin = lin.y ~ x, quad = lin.y ~ poly(x, 2))
sapply(fos, function(fo) AIC(lm(fo))) # lin has lowest AIC
##       log       lin      quad 
## 81.577530  3.408632  5.407171 

fos <- list(log = quad.y ~ log(x), lin = quad.y ~ x, quad = quad.y ~ poly(x, 2))
sapply(fos, function(fo) AIC(lm(fo))) # quad has lowest AIC
##      log      lin     quad 
## 313.2159 110.9217   1.0147

Note that this can be made automatic and is superior to using nls since lm is much more stable.

We have returned the name of the chosen model but it could readily be modified to return the formula or the lm object depending on what is wanted. In those cases call it using lapply and Map instead of sapply and mapply.

opt <- function(x, y) {
  nm <- if (length(unique(x)) < 3) {
   "lin"
  } else {
    fos <- list(log = y ~ log(x), lin = y ~ x, quad = y ~ poly(x, 2))
    s <- sapply(fos, function(fo) AIC(lm(fo)))
    names(s)[which.min(s)]
  }
  nm

  # optionally uncomment 1 of next 2 lines
  # fos[[nm]]
  # lm(fos[[nm]])

}

sapply(list(A = log.y, B = lin.y, C = quad.y), opt, x = x)
##      A      B      C 
##  "log"  "lin" "quad"

Another example. This one uses the built-in anscombe data set.

mapply(opt, anscombe[1:4], anscombe[5:8])
##     x1     x2     x3     x4 
##  "log" "quad"  "lin"  "lin"

TechQA.

Score models based on log, quadratic, or linear fit on a 0-1 scale

There are 1 answers

Related Questions in R

Related Questions in MODEL

Related Questions in REGRESSION

Related Questions in LOGARITHM

Related Questions in QUADRATIC

Popular Questions

Trending Questions