how can I split data in 3 or more parts with sklearn

Question

how can I split data in 3 or more parts with sklearn

7k views Asked by loseryao At 15 September 2017 at 05:44

I want to split data into train,test and validation datasets which are stratification, but sklearn only provides cross_validation.train_test_split which only can divide into 2 pieces. What should i do if i want do this

Original Q&A

There are 2 answers

Nathan Karasch On 28 September 2018 at 16:10

You can also use train_test_split more than once to achieve this. The second time, run it on the training output from the first call to train_test_split.

from sklearn.model_selection import train_test_split

def train_test_validate_stratified_split(features, targets, test_size=0.2, validate_size=0.1):
    # Get test sets
    features_train, features_test, targets_train, targets_test = train_test_split(
        features,
        targets,
        stratify=targets,
        test_size=test_size
    )
    # Run train_test_split again to get train and validate sets
    post_split_validate_size = validate_size / (1 - test_size)
    features_train, features_validate, targets_train, targets_validate = train_test_split(
        features_train,
        targets_train,
        stratify=targets_train,
        test_size=post_split_validate_size
    )
    return features_train, features_test, features_validate, targets_train, targets_test, targets_validate

**Gambit1614** · Accepted Answer · 2017-09-15T05:57:50+00:00

If you want to use a Stratified Train/Test split, you can use StratifiedKFold in Sklearn

Suppose X is your features and y are your labels, based on the example here :

from sklearn.model_selection import StratifiedKFold
cv_stf = StratifiedKFold(n_splits=3)
for train_index, test_index in skf.split(X, y):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

Update : To split data into say 3 different percentages use numpy.split() can be done like this :

X_train, X_test, X_validate  = np.split(X, [int(.7*len(X)), int(.8*len(X))])
y_train, y_test, y_validate  = np.split(y, [int(.7*len(y)), int(.8*len(y))])

TechQA.

how can I split data in 3 or more parts with sklearn

There are 2 answers

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in SCIKIT-LEARN

Related Questions in CROSS-VALIDATION

Related Questions in TRAIN-TEST-SPLIT

Popular Questions

Popular Tags

Trending Questions