I want to use repeated random 80%/20% splits as training sets because my dataset is only ~800 individuals with 5% event rate.
#example data
dd_cleannames = data.frame(class = sample(c(0,1),100,replace = TRUE),var1 = sample(c(1:5),100,replace= TRUE),var2 = sample(c(10:20),100,replace = TRUE))
#create data partitions for 5 random training sets
set.seed(100)
indices <- caret::createDataPartition(dd_cleannames$class, p = 0.8,times = 5,list = FALSE)
In another SO thread I found this answer: https://stackoverflow.com/a/59276788/4685471
resample_data <- tibble(
training_sets = map(indices, ~ dd_cleannames[.x, ]),
testing_sets = map(indices, ~ dd_cleannames[-.x, ])
)
Now I create my control:
ctrl = trainControl(method = "LGOCV",
number = 5,
p = 0.8,
classProbs = TRUE,
summaryFunction = twoClassSummary)
but when I try to implement generalized boosting models, I get an error:
gbm = train(class ~ ., data = resample_data$training_sets,
method = "gbm",
trControl = ctrl,
verbose = FALSE)
Error in terms.formula(formula, data = data) :
duplicated name 'var1' in data frame using '.'
Alternatively I tried the same workflow except using list = TRUE in the createDataPartition function, but get the following error:
set.seed(100)
indices <- caret::createDataPartition(dd_cleannames$class, p = 0.8,times = 5,list = TRUE)
resample_data <- tibble(
training_sets = map(indices, ~ dd_cleannames[.x, ]),
testing_sets = map(indices, ~ dd_cleannames[-.x, ])
)
ctrl = trainControl(method = "LGOCV",
number = 5,
p = 0.8,
classProbs = TRUE,
summaryFunction = twoClassSummary)
gbm = train(class ~ ., data = resample_data$training_sets,
method = "gbm",
trControl = ctrl,
verbose = FALSE)
Error in eval(predvars, data, env) : object 'Resample1.class' not found
Then I get this error:
Error in eval(predvars, data, env) : object 'Resample1.class' not found
How to resolve this?