Performing multiple regression models with the use of sparse Matrix of class "dgCMatrix" data simultaneously in R

180 views Asked by At

I'm trying to calculate Variance Inflation Factor (VIF) in R. I use the following code

library(readxl)
data <-read_excel("data.xlsx")

vif.Rsquared <- function(object)
      UseMethod("vif")
    
    vif.default <- function(object) {
      if(!is.data.frame(object) & !is.matrix(object)) stop("Not matrix or data frame")
      if(is.data.frame(object)) object <- as.matrix(object)
      ncols <- dim(object)[2]
      v <- numeric(ncols)
      names(v) <- dimnames(object)[[2]]
      for(i in 1:ncols) v[i] <- 1/(1-summary(lm(object[,i]~object[,-i]))$r.squared)
      v
    }
    
    vif.Rsquared(data)

VIFs are calculated by taking a predictor, and regressing it against every other predictor in the model. This gives you the R-squared values, which can then be plugged into the VIF formula 1. But instead of finding the R-squared with all X as predictors, i want it to use only the statistically significant ones. Based on that I performed LASSO regression that displayes in each case the s.s. variables

    results <- lapply(seq_len(ncol(A)), function(i) {
  list(
    fit_lasso = glmnet(A[, -i], A[, i], standardize = T, alpha = 0.9), 
    cvfit = cv.glmnet(A[, -i] , A[, i] , standardize = TRUE , type.measure = "mse" , nfolds = 5 , alpha = 0.9)
  )
})

lapply(results, function(x) coef(x$cvfit, s = "lambda.min"))

the above gives as output something like this for all variables.

> lapply(results, function(x) coef(x$cvfit, s = "lambda.min"))
[[1]]
15 x 1 sparse Matrix of class "dgCMatrix"
                        1
(Intercept) -3.221914e+03
X2           .           
X3           2.829016e-05
X4           .           
X5           8.122328e-01
X6           3.334640e+00
X7           1.837722e+01
X8           2.913922e+01
X9           .           
X10          3.761598e-01
X11          .           
X12          5.481565e+00
X13          7.445466e+00
X14          2.265301e-01
X15          .           

[[2]]
15 x 1 sparse Matrix of class "dgCMatrix"
                       1
(Intercept)  0.981845278
X1           .          
X3           .          
X4           .          
X5           .          
X6           0.009778893
X7           .          
X8           0.147952855
X9           .          
X10         -0.002225547
X11          .          
X12          .          
X13          0.890613534
X14          .          
X15          .          

and goes on for all availabe variables.

Is there anyway to programm R to find the R-squared and Adj. R-square of all variables only using the variables that LASSO keeps as s.s.? How to modify the vif function above to do this? Is it possible?

0

There are 0 answers