How can I apply extreme bounds analysis to a dataset of over 100 variables with the ExtremeBounds package in R?

212 views Asked by At

I have a dataset consisting of 107 variables with 1794 observations. I want to implement Extreme Bounds Analysis in order to determine which of the 106 variables are robustly correlated with the dependent variable throughout a wide range of regressions, each one with a different model specification. I intend to select the most robust variables for my definitive model.

I'm using Marek Hlavac's ExtremeBounds package. I'm trying to run the following line of code:

free=eba(formula=flg_activacion_0_12~., data=Data1, k=0:106, reg.fun=glm, family=binomial(link='logit'), draws=100)

The dependent variable

flg_activacion_0_12

is a dummy, that's why I choose the binomial link in the family argument.

The reg.fun argument is for R not to run OLS regressions but generalized linear models such as logit.

I set the k argument as 0:106. That means I want to determine if the variables are robust among models that include up to 106 variables. However, the total amount of models to estimate would be inmense. There are 106 possible models that include only one explanatory variable. The are 106!/[2!(104!)] possible models that include two explanatory variables. The argument draws=100 limits the amount of models to just 100. It runs only 100 models chosen randomly from the inmense pool of models that can be written as combinations of the 106 variables.

I believe the argument draws should make this task possible for my computer, but I get the following error messages:

All variables in argument 'focus' must be in the data frame. 

Argument 'k' is too high for the given number of doubtful variables.

I have already checked the documentation, and since I haven't specified which variables are free, which ones are focus and which ones are doubtful then all 106 variables should be considered focus. I don't understand why it suggests that some focus variables are not in my dataframe. Please tell me what am I doing wrong and how could I do what I'm intending to do.

1

There are 1 answers

1
paoloeusebi On BEST ANSWER

I think the problem here is with the formula argument. You will end up with the same error also with this code:

 library(ExtremeBounds) 
 naive.eba <- eba(formula = mpg ~. , data = mtcars, k = 0:9)

The model works well if you use (as in the ExtremeBounds vignette) the following command, which spells the dependent variables in the formula:

 naive.eba <- eba(formula = mpg ~ cyl + carb + disp + hp + vs + drat + wt + qsec + gear + am, data = mtcars, k = 0:9)