I am new in data sicence und actually try to build my first model. I am confuse about the correct way to use the split function. Most of documentations recommend the following approach (where X=data und Y= label):
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
I have a dataset without label (X=data), and want to build a model based on it to predict anomalies. That means, I can actually split my dataset only in 2 (portion: X_train and X_test). But I am not sure if this is the correct for my dataset and would like to know how should I proceed to get y. Thank you advance for your support
You can see the example in the link. The function can work on one variable also
In your case, the answer will be