How to use repeated random training/test splits sets inside train() function?

42 views Asked by At

I want to use repeated random 80%/20% splits as training sets because my dataset is only ~800 individuals with 5% event rate.

#example data
dd_cleannames = data.frame(class = sample(c(0,1),100,replace = TRUE),var1 = sample(c(1:5),100,replace= TRUE),var2 = sample(c(10:20),100,replace = TRUE))

#create data partitions for 5 random training sets
set.seed(100)
indices <- caret::createDataPartition(dd_cleannames$class, p = 0.8,times = 5,list = FALSE) 

In another SO thread I found this answer: https://stackoverflow.com/a/59276788/4685471

resample_data <- tibble(
  training_sets = map(indices, ~ dd_cleannames[.x, ]),
  testing_sets = map(indices, ~ dd_cleannames[-.x, ])
)

Now I create my control:

ctrl = trainControl(method = "LGOCV",
                    number = 5,
                    p = 0.8,
                    classProbs = TRUE,
                    summaryFunction = twoClassSummary)

but when I try to implement generalized boosting models, I get an error:

gbm = train(class ~ ., data = resample_data$training_sets,
            method = "gbm",
            trControl = ctrl,
            verbose = FALSE)
Error in terms.formula(formula, data = data) : 
  duplicated name 'var1' in data frame using '.'

Alternatively I tried the same workflow except using list = TRUE in the createDataPartition function, but get the following error:

set.seed(100)
indices <- caret::createDataPartition(dd_cleannames$class, p = 0.8,times = 5,list = TRUE) 

resample_data <- tibble(
  training_sets = map(indices, ~ dd_cleannames[.x, ]),
  testing_sets = map(indices, ~ dd_cleannames[-.x, ])
)


ctrl = trainControl(method = "LGOCV",
                    number = 5,
                    p = 0.8,
                    classProbs = TRUE,
                    summaryFunction = twoClassSummary)


gbm = train(class ~ ., data = resample_data$training_sets,
            method = "gbm",
            trControl = ctrl,
            verbose = FALSE)
Error in eval(predvars, data, env) : object 'Resample1.class' not found

Then I get this error:

Error in eval(predvars, data, env) : object 'Resample1.class' not found

How to resolve this?

0

There are 0 answers