I'm trying to perform multiple LASSO regressions in R using the following code:
library(readxl)
data <-read_excel("data.xlsx") # 20x20 matrix
library(glmnet)
library(coefplot)
A <- as.matrix(data)
results <- lapply(seq_len(ncol(A)), function(i) {
list(
fit_lasso = glmnet(A[, -i], A[, i], standardize = T, alpha = 1),
cvfit = cv.glmnet(A[, -i] , A[, i] , standardize = TRUE , type.measure = "mse" , nfolds = 10 , alpha = 1)
)
})
coefficients <- lapply(results, function(x, fun) fun(coef(x$cvfit, s = "lambda.min")), function(x) x[x[, 1L] != 0L, 1L, drop = FALSE])
My output results
results in a Large list (20 elements, 1MB)
with 20 same LASSO output but for 20 variables and coefficients
output is only the significant variables in each case.
I notice that for the same dataset the results are not always the same - maybe because of lambda changing values in each run? not sure. I want to make my code to find the same lambda.min
's and give always the same results when I run the dataset. I believe a set.seed() might manage it but can't figure out how to sufficiently include it.
How can I always make it print the same outputs for a specific dataset?
I got it to produce the same
lambda.min
values from run to run just by puttingset.seed()
before the list. Then, you're setting the seed for the random draws of the cross-validation runs.