I'm trying to calculate Variance Inflation Factor (VIF) in R. I use the following code
library(readxl)
data <-read_excel("data.xlsx")
vif.Rsquared <- function(object)
UseMethod("vif")
vif.default <- function(object) {
if(!is.data.frame(object) & !is.matrix(object)) stop("Not matrix or data frame")
if(is.data.frame(object)) object <- as.matrix(object)
ncols <- dim(object)[2]
v <- numeric(ncols)
names(v) <- dimnames(object)[[2]]
for(i in 1:ncols) v[i] <- 1/(1-summary(lm(object[,i]~object[,-i]))$r.squared)
v
}
vif.Rsquared(data)
VIFs are calculated by taking a predictor, and regressing it against every other predictor in the model. This gives you the R-squared values, which can then be plugged into the VIF formula 1. But instead of finding the R-squared with all X as predictors, i want it to use only the statistically significant ones. Based on that I performed LASSO regression that displayes in each case the s.s. variables
results <- lapply(seq_len(ncol(A)), function(i) {
list(
fit_lasso = glmnet(A[, -i], A[, i], standardize = T, alpha = 0.9),
cvfit = cv.glmnet(A[, -i] , A[, i] , standardize = TRUE , type.measure = "mse" , nfolds = 5 , alpha = 0.9)
)
})
lapply(results, function(x) coef(x$cvfit, s = "lambda.min"))
the above gives as output something like this for all variables.
> lapply(results, function(x) coef(x$cvfit, s = "lambda.min"))
[[1]]
15 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) -3.221914e+03
X2 .
X3 2.829016e-05
X4 .
X5 8.122328e-01
X6 3.334640e+00
X7 1.837722e+01
X8 2.913922e+01
X9 .
X10 3.761598e-01
X11 .
X12 5.481565e+00
X13 7.445466e+00
X14 2.265301e-01
X15 .
[[2]]
15 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) 0.981845278
X1 .
X3 .
X4 .
X5 .
X6 0.009778893
X7 .
X8 0.147952855
X9 .
X10 -0.002225547
X11 .
X12 .
X13 0.890613534
X14 .
X15 .
and goes on for all availabe variables.
Is there anyway to programm R to find the R-squared and Adj. R-square of all variables only using the variables that LASSO keeps as s.s.? How to modify the vif function above to do this? Is it possible?