Using cv.glmnet on a very large data-set

610 views Asked by At

I use the cv.glmnet() function to estimate a penalized multinomial logit model. Since the datset is far too large to make things reproducable, I show the call of the function:

cvfit = cv.glmnet(x= X, 
                  y=as.numeric(dat$choice_t) ,
                  family="multinomial", 
                  type.multinomial = "grouped", 
                  parallel = TRUE,
                  alpha=0,
                  nfolds=5)

Using my code on a "small" subset of my entire data-set works fine, but including all 30 mio. observations lead to the following error:

Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs,  : 
  lange Vektoren (Argument 5) nicht unterstützt in  .C
Ruft auf: cv.glmnet -> glmnet -> lognet -> .Fortran

Appart form the many oberservations I include many interaction terms - 50 in total.

I allready work on a server using 16 GB for all 8 CPU cores.

What can I do to avoid this problem?

Are there any options I can set?

0

There are 0 answers