I have a data frame with 42 columns and 545 rows. One column (Column 4) is my dependent variable, and 38 are my predictors. I need to run RFE to find the best model and find the best set of variables.
1- First, I introduced my libraries
library(caret)
library(mlbench)
library(Hmisc)
library(randomForest)
library(raster)
library(recipes)
library(tidyverse)
library(doParallel)
library(ggplot2)
2-Then, I gave my dependent variable to Y and predictors to X, and I normalized my predictors as follows:
X <- RFE[,c(5:42)] # It is a data frame of predictor variables
Y <- RFE[,c(4)] # This is BD_15
## We normalize the predictors and put them in a data frame.
normalization <- preProcess(X)
X <- predict(normalization, X)
X <- as.data.frame(X)
3-After this, I specified my control parameters and then ran the rfe() function as follows:
set.seed(10)
# RFE Control parameters
ctrl <- rfeControl(functions = caretFuncs,
method = "repeatedcv",
rerank = TRUE,
repeats = 5,
number= 5,
allowParallel = TRUE)
# RFE models
RF_38 <- rfe(X, Y,sizes = c(1:38),
method ='rf',
rfeControl = ctrl,
tuneGrid = data.frame(mtry=6))
After running this rfe() model, I get the following error message.
Error in { : task 1 failed - "replacement has 1 row, data has 0"
Before, I had run rfe() many times successfully without any problems, but now I get this error. I checked my data frame. There is no problem with values, and there is no missing values. How can I run rfe() successfully without getting this error? Why do I get this error? Please help me.
This error message indicates the problem occurs while attempting a specific task or operation. the "replacement has 1 row, data has 0" part explains that the dimensions of the tables are incompatible (apparently) with the replacement operation.
How Should I resolve this problem: Since rfe is a complicated loop, it needs a lot of RAM (memory). The RAM Of my computer was 32G, which was still insufficient. I moved my script and data to a more robust computer with 64G of RAM, and the problem was resolved.