Getting error: KeyError: 'Only the Series name can be used for the key in Series dtype mappings.' when trying to do pandas Smote algorithm

11.5k views Asked by At

My data is slightly unbalanced, so I am trying to do a SMOTE algorithm before doing the logistic regression model. When I do, I get the error: KeyError: 'Only the Series name can be used for the key in Series dtype mappings.' Could someone help me figure out why? Here is the code:

X = dummies.loc[:, dummies.columns != 'Count']
y = dummies.loc[:, dummies.columns == 'Count']
#from imblearn.over_sampling import SMOTE
os = SMOTE(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
columns = X_train.columns
os_data_X,os_data_y=os.fit_sample(X_train, y_train) # here is where it errors
os_data_X = pd.DataFrame(data=os_data_X,columns=columns )
os_data_y= pd.DataFrame(data=os_data_y,columns=['Count'])

Thank you!

4

There are 4 answers

4
Maxime On

I just encountered this problem myself. As it turned out, I had a duplicate column in my dataset. Perhaps double check that this is not the case for your dataset.

1
devdon On

I actually just fixed this problem! I made them matrices: os_data_X,os_data_y=os.fit_sample(X_train.as_matrix(), y_train.as_matrix())

0
Muhammad Imran Zaman On

100% correct solution.

Try to convert your X features into an array first and then feed to SMOTE:

sm = SMOTE()

X=np.array(X)

X, y = sm.fit_sample(X, y.ravel())

0
Beta Ways On

This error is mainly due to the fact that you have duplicate columns in your data. To check for duplicate columns, use:

df.head()

or df.columns

To fix, drop columns using:

df.drop('column_name', axis=1, inplace=True) 

to drop the duplicated column(s).