What is an alternative to leap() that can handle NAs?

3.8k views Asked by At

Need to apply Branch and bound method to choose best model. leaps() from leaps package works well, only if the data has no NA values, otherwise throws an error:

#dummy data
x<-matrix(rnorm(100),ncol=4)
#convert to 0,1,2 - this is a genetic data, NA=NoCall
x<-matrix(round(runif(100)*10) %% 3,ncol=4)
#introduce NA=NoCall
x[1,1] <-NA
#response, case or control
y<-rep(c(0,1,1,0,1),5)
leaps(x,y)

Error in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = NCOL(x) + int,  : 
  NA/NaN/Inf in foreign function call (arg 4)

Using only complete.cases() is not an option as I lose 80% of data.

What is an alternative to leap that can handle NAs? I am writing my own function to do something similar, but it is getting big and clunky, I feel like I am reinventing the wheel...

UPDATE: I have tried using stepAIC(), facing the same problem of missing data:

Error in stepAIC(fit) : 
  number of rows in use has changed: remove missing values?
2

There are 2 answers

1
zx8754 On BEST ANSWER

This is a stats problem, as AIC can't compare models built with different data sets. So to compare models with and without certain variables, you need to remove the rows with missing values for those variables. You may need to "reconsider your modeling strategy", to quote Ben Bolker. Otherwise you may also want to look into variants of AIC, a quick Google search brings up a recent JASA article that might be a good starting point.

- Aaron

1
David Z On

you may try bestglm::bestglm where branch-bound method can be specified. The NAs can be handled by na.action argument as it in glm. see here for additional information: http://cran.r-project.org/web/packages/bestglm/vignettes/bestglm.pdf