Create train and test variables from loaded arff file

797 views Asked by At

I want perform multilabel classification. A have a dataset in arff format which I load. However I don't now how convert import data to X and y vectors in order to apply sklearn/train_test_split.

How can I get X and y?

data, meta = scipy.io.arff.loadarff('../yeast-train.arff')
df = pd.DataFrame(data)

#Get X, y
X, y = ??? <---

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
1

There are 1 answers

0
Vivek Kumar On

Ok. Its a multilabel data in which features are in the columns Att1, Att2, Att3.... Att20 and targets are in the columns Class1, Class2, .... Class14.

So you need to use those columns for getting the X and y. Do it like this:

# Fill the .... with all other column names
feature_cols = ['Att1', 'Att2', 'Att3', 'Att4', 'Att5' ....   'Att20']
target_cols = ['Class1', 'Class2', 'Class3', 'Class4', ....   'Class14']

X, y = df[feature_cols], df[target_cols]