Cross validation with specified number of training data?

Question

Cross validation with specified number of training data?

159 views Asked by ordem At 04 October 2020 at 08:35

Objective

I want to perform k-fold cross-validation, but instead of using k-1 dataset for training and k dataset for test, I want to determine the number of training data, exactly like train_test_split 's train_size. Then the remainder as test data.

To be precise I have binary classification dataset, and I want 10 instances of each class when doing cross val.

Expected Function

Let's say I want to do 5-fold CV:

cross_val_score(estimator=my_model, X, y, cv=5, train_size=20)

And of course in this case my X, y should have >= 100 instances.

My Attempt

Well I just built them manually. The closest I can get is iterating:

for _ in range (5):    
  X_tr, X_te, y_tr, y_te = train_test_split(X, y, train_size=20, stratified=y)

But this randomly picks the data and may result in two train dataset being alike, plus it doesn't accommodate cv.

Note

Yes, this will result in some dataset not being used for the training set, but that is what I want to achieve in my current work.

Is there any python function that provides this functionality?

Original Q&A

There are 1 answers

**Danylo Baibak** · Answer 1 · 2020-10-04T13:10:50+00:00

You can still use KFold, but with additional logic.

Determine the amount of the test data: test_amount = total_amount * test_size.

Determine the amount of the splits: n_splits = total_amount // test_amount.

Use Kfolds:

kf = KFold(n_splits=n_splits)
for train_index, test_index in kf.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

TechQA.

Cross validation with specified number of training data?

Objective

Expected Function

My Attempt

Note

There are 1 answers

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in SCIKIT-LEARN

Related Questions in CROSS-VALIDATION

Related Questions in K-FOLD

Popular Questions

Popular Tags

Trending Questions