python sklearn cross_validation /number of labels does not match number of samples

Question

python sklearn cross_validation /number of labels does not match number of samples

5.2k views Asked by hmmmbob At 20 June 2015 at 20:14

Doing a course on machine learning, and I want to split the data into train and test sets. I want to split it up, use Decisiontree on it for training, and then print out the score of my test set. The cross validation parameters in my code were given. Does anyone see what I did wrong?

The error I get is the following :

Traceback (most recent call last):
  File "/home/stephan/ud120-projects/validation/validate_poi.py", line 36, in <module>
    clf = clf.fit(features_train, labels_train)
  File "/home/stephan/.local/lib/python2.7/site-packages/sklearn/tree/tree.py", line 221, in fit
    "number of samples=%d" % (len(y), n_samples))
ValueError: Number of labels=29 does not match number of samples=66

Here is my code:

import pickle
import sys
sys.path.append("../tools/")
from feature_format import featureFormat, targetFeatureSplit

data_dict = pickle.load(open("../final_project/final_project_dataset.pkl", "r") )

features_list = ["poi", "salary"]

data = featureFormat(data_dict, features_list)
labels, features = targetFeatureSplit(data)

from sklearn import tree
from sklearn import cross_validation

features_train, labels_train, features_test, labels_test = \
    cross_validation.train_test_split(features, labels, random_state=42, test_size=0.3)



clf = tree.DecisionTreeClassifier()
clf = clf.fit(features_train, labels_train)
print clf.score(features_test, labels_test)

Original Q&A

There are 2 answers

Aarif1430 On 20 June 2018 at 11:25

You need to pass test_size = 0.5 in train_ test_split function

train_test_split(...,test_size=0.5,...)

**Alexander** · Accepted Answer · 2015-06-20T20:39:53+00:00

Alexander On 20 June 2015 at 20:39 BEST ANSWER

Your variables don't appear to match the return pattern for train_test_split

Try:

features_train, features_test, labels_train, labels_test = ...

TechQA.

python sklearn cross_validation /number of labels does not match number of samples

There are 2 answers

Related Questions in PYTHON

Related Questions in SCIKIT-LEARN

Related Questions in CROSS-VALIDATION

Popular Questions

Popular Tags

Trending Questions