For my PhD I use a Lasso approach in R for variable selection. Now, I used the package glmnet and also hdm. What is the difference of the basic lasso estimator for logistic regression in these two packages? I read the docs and also googled a lot but the only hint that I found was this one which was not very helpful for my exact purpose.
The reason for asking is because my models converge if I use glmnet and they sometimes do not converge when I use hdm. That is why I assume that the difference is in the optimization function. Here is a minimal example:
# Delete environment
rm(list = ls())
# Packages
library(glmnet)
#> Loading required package: Matrix
#> Loaded glmnet 4.1-4
library(hdm)
# get data
data = read.table("https://pastebin.com/raw/gmXk0h2P", sep = ",", header = T)
# do the lasso
lasso_hdm = rlassologit(dep ~ ., data = data)
#> Warning: from glmnet C++ code (error code -1); Convergence for 1th lambda value
#> not reached after maxit=100000 iterations; solutions for larger lambdas returned
#> Warning in getcoef(fit, nvars, nx, vnames): an empty model has been returned;
#> probably a convergence issue
lasso_glm = glmnet(as.matrix(data[,!(names(data) %in% c("dep"))]), data$dep, family = "binomial")
Created on 2022-05-31 by the reprex package (v2.0.1)
Additionally, please find my sessionInfo:
sessionInfo()
#> R version 4.2.0 (2022-04-22)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=en_GB.UTF-8
#> [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_GB.UTF-8
#> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] rstudioapi_0.13 knitr_1.39 magrittr_2.0.3 R.cache_0.15.0
#> [5] rlang_1.0.2 fastmap_1.1.0 fansi_1.0.3 stringr_1.4.0
#> [9] styler_1.7.0 highr_0.9 tools_4.2.0 xfun_0.31
#> [13] R.oo_1.24.0 utf8_1.2.2 cli_3.3.0 withr_2.5.0
#> [17] htmltools_0.5.2 ellipsis_0.3.2 yaml_2.3.5 digest_0.6.29
#> [21] tibble_3.1.7 lifecycle_1.0.1 crayon_1.5.1 purrr_0.3.4
#> [25] R.utils_2.11.0 vctrs_0.4.1 fs_1.5.2 glue_1.6.2
#> [29] evaluate_0.15 rmarkdown_2.14 reprex_2.0.1 stringi_1.7.6
#> [33] compiler_4.2.0 pillar_1.7.0 R.methodsS3_1.8.1 pkgconfig_2.0.3
Created on 2022-05-31 by the reprex package (v2.0.1)
In the end I am interested in the theory of both packages and maybe I find a good reason to stick to the glmnet package as this converges.
Thank you so much in advance!