I use the cv.glmnet()
function to estimate a penalized multinomial logit model.
Since the datset is far too large to make things reproducable, I show the call of the function:
cvfit = cv.glmnet(x= X,
y=as.numeric(dat$choice_t) ,
family="multinomial",
type.multinomial = "grouped",
parallel = TRUE,
alpha=0,
nfolds=5)
Using my code on a "small" subset of my entire data-set works fine, but including all 30 mio. observations lead to the following error:
Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, :
lange Vektoren (Argument 5) nicht unterstützt in .C
Ruft auf: cv.glmnet -> glmnet -> lognet -> .Fortran
Appart form the many oberservations I include many interaction terms - 50 in total.
I allready work on a server using 16 GB for all 8 CPU cores.
What can I do to avoid this problem?
Are there any options I can set?