In R, I have a logistic regression model as follows

train_control <- trainControl(method = "cv", number = 3)

logit_Model <- train(result~., data=df,
           trControl = train_control,
           method = "glm",
           family=binomial(link="logit"))

calculatedVarImp <- varImp(logit_Model, scale = FALSE)

I use multiple datasets that run through the same code, so the variable importance changes for each dataset. Is there a way to get the names of the variables that are less than n (e.g. 1) in the overall importance, so I can automate the removal of those variables and rerun the model.

I was unable to get the information from 'calculatedVarImp' variable by subsetting 'overall' value

lowVarImp <- subset(calculatedVarImp , importance$Overall <1)

Also, is there a better way of doing variable selection?

Thanks in advance

1 Answers

4
Humpelstielzchen On Best Solutions

You're using the caret package. Not sure if you're aware of this, but caret has a method for stepwise logistic regression using the Akaike Information Criterion: glmStepAIC.

So it iteratively trains a model for every subset of predictors and stops at the one with the lowest AIC.


train_control <- trainControl(method = "cv", number = 3)

logit_Model <- train(y~., data= train_data,
                     trControl = train_control,
                     method = "glmStepAIC",
                     family=binomial(link="logit"),
                     na.action = na.omit)

logit_Model$finalModel

This is as automated as it gets but it may be worth reading this answer about the downsides to this method:

See Also.