How to split unlabeled data into train and test set using train_test_split?

2.4k views Asked by At

I am new in data sicence und actually try to build my first model. I am confuse about the correct way to use the split function. Most of documentations recommend the following approach (where X=data und Y= label):

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

I have a dataset without label (X=data), and want to build a model based on it to predict anomalies. That means, I can actually split my dataset only in 2 (portion: X_train and X_test). But I am not sure if this is the correct for my dataset and would like to know how should I proceed to get y. Thank you advance for your support

1

There are 1 answers

0
Yuvraj Takey On

You can see the example in the link. The function can work on one variable also

train_test_split(y, shuffle=False)

In your case, the answer will be

X_train, X_test = train_test_split(X, test_size=0.2, random_state=1)