R: What is the difference of the Lasso for variable selection between the packages glmnet and hdm

147 views Asked by At

For my PhD I use a Lasso approach in R for variable selection. Now, I used the package glmnet and also hdm. What is the difference of the basic lasso estimator for logistic regression in these two packages? I read the docs and also googled a lot but the only hint that I found was this one which was not very helpful for my exact purpose.

The reason for asking is because my models converge if I use glmnet and they sometimes do not converge when I use hdm. That is why I assume that the difference is in the optimization function. Here is a minimal example:

# Delete environment
rm(list = ls())

# Packages
library(glmnet)
#> Loading required package: Matrix
#> Loaded glmnet 4.1-4
library(hdm)

# get data
data = read.table("https://pastebin.com/raw/gmXk0h2P", sep = ",", header = T)

# do the lasso
lasso_hdm = rlassologit(dep ~ ., data = data)
#> Warning: from glmnet C++ code (error code -1); Convergence for 1th lambda value
#> not reached after maxit=100000 iterations; solutions for larger lambdas returned
#> Warning in getcoef(fit, nvars, nx, vnames): an empty model has been returned;
#> probably a convergence issue
lasso_glm = glmnet(as.matrix(data[,!(names(data) %in% c("dep"))]), data$dep, family = "binomial")

Created on 2022-05-31 by the reprex package (v2.0.1)

Additionally, please find my sessionInfo:

sessionInfo()
#> R version 4.2.0 (2022-04-22)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_GB.UTF-8    
#>  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_GB.UTF-8   
#>  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] rstudioapi_0.13   knitr_1.39        magrittr_2.0.3    R.cache_0.15.0   
#>  [5] rlang_1.0.2       fastmap_1.1.0     fansi_1.0.3       stringr_1.4.0    
#>  [9] styler_1.7.0      highr_0.9         tools_4.2.0       xfun_0.31        
#> [13] R.oo_1.24.0       utf8_1.2.2        cli_3.3.0         withr_2.5.0      
#> [17] htmltools_0.5.2   ellipsis_0.3.2    yaml_2.3.5        digest_0.6.29    
#> [21] tibble_3.1.7      lifecycle_1.0.1   crayon_1.5.1      purrr_0.3.4      
#> [25] R.utils_2.11.0    vctrs_0.4.1       fs_1.5.2          glue_1.6.2       
#> [29] evaluate_0.15     rmarkdown_2.14    reprex_2.0.1      stringi_1.7.6    
#> [33] compiler_4.2.0    pillar_1.7.0      R.methodsS3_1.8.1 pkgconfig_2.0.3

Created on 2022-05-31 by the reprex package (v2.0.1)

In the end I am interested in the theory of both packages and maybe I find a good reason to stick to the glmnet package as this converges.

Thank you so much in advance!

0

There are 0 answers