Finding the best bandwidth for kernel smoothing regression in R

817 views Asked by At

I have simulated bivariate data (x,y) where y has mean 1/x and some variance. The data looks something like this: Data

I am using kernel smoothing regression to try and find this relationship.

kernelreg = ksmooth( train_points$x, train_points$y, kernel = "normal", bandwidth = h) plot(y~x, train_points , cex = 0.5, col = "dodgerblue", main = "Data set") lines(kernelreg,lwd = 2, col = 2)

I am wondering how I can write a function to run this regression through a list of bandwidths and compute the rmse in training and test data. Thus showing the optimum bandwidth which minimizes the error of the model.

1

There are 1 answers

0
Elia On

You can put your model into a function and iterate over the bandwith argument with lapply. Then you can simply calculate RMSE for each run and take the min.

library(caret)#for RMSE() function
set.seed(5)
x <- runif(1000)
y <- 20*(1/exp(x*20))+runif(1000,1,5)
plot(x,y)
df <- data.frame(x,y)
ind <- sample(1:nrow(df),nrow(df))
train_points <- df[ind,]
test_points <- df[-ind,]

mykern <- function(x, y, bw) {
  kernelreg <- lapply(bw, function(bw)
    ksmooth(x,
            y,
            kernel = "normal",
            bandwidth = bw))
  names(kernelreg)<- bw
  rmse <- lapply(kernelreg, function(x)RMSE(x[["y"]],y))
  names(rmse) <- bw
  best.bw <- names(rmse[rmse==min(unlist(rmse))])
  
  best.kern <- kernelreg[[which(names(kernelreg)==best.bw)]]
  ll <- list(best.model=best.kern,best.bandwith=best.bw)
  return(ll)
}

kernelreg <-  mykern(train_points$x,
                    train_points$y,
                    bw = seq(0.1,1,0.1))

However take a look at the KernSmooth package, as suggested by the documentation of ksmooth:

This function was implemented for compatibility with S, although it is nowhere near as slow as the S function. Better kernel smoothers are available in other packages such as KernSmooth.