How to solve sklearn error: "Found input variables with inconsistent numbers of samples"?

6.8k views Asked by At

I have a challenge using the sklearn 70-30 division. I receive an error on line:

X_train, X_test, y_train, y_test = train_test_split(X_smote, y_smote, test_size=0.3, stratify=y)

The error is:

Found input variables with inconsistent numbers of samples

Context

from imblearn.over_sampling import SMOTE
    
sm = SMOTE(k_neighbors = 1)
X = data.drop('cluster',axis=1)
y = data['cluster']
    
X_smote, y_smote= sm.fit_sample(X,y)
    
data_bal = pd.DataFrame(columns=X.columns.values, data=X_smote)
data_bal['cluster']=y_smote
    
from sklearn.model_selection import  train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_smote, y_smote, test_size=0.3, stratify=y)
y_train.value_counts().plot(kind='bar')

Edit

I solve the error, I just had to put the stratify=y in stratify=y_smote

2

There are 2 answers

3
Darryl Strachan On

Just an observation in your line of code:

X_train, X_test, y_train, y_test = train_test_split(X_smote, y_smote, test_size=0.3, stratify=y)

The error thrown typically is a result of some input value that is expected to have a particular dimension or length that is consistent with other input values.

Check the length and/or dimensions of X_smote, y_smote and y to see if they are all as expected.

0
Yashwant Devatwal On

I got the same Issue but when I changed

x_train,y_train,x_test,y_test = train_test_split(x,y,test_size=0.25,random_state=42)

to

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.25,random_state=42)

my error got removed.