SMOTE for balancing data

406 views Asked by At

I am trying to train a GradientBoosting classifier. Since my data are unbalanced, I am considering SMOTE to balance it. I tried as follow:

from sklearn.ensemble import GradientBoostingRegressor
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.datasets import load_boston
from sklearn.metrics import mean_absolute_error

# Import train_test_split function
from sklearn.model_selection import train_test_split

# Split dataset into training set and test set

from imblearn.over_sampling import SMOTE

y=df['Label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, stratify=y)
sm = SMOTE(random_state = 42)
X_train_oversampled, y_train_oversampled = sm.fit_sample(X_train, y_train)
X_train = pd.DataFrame(X_train_oversampled, columns=X_train.columns)

but I have got this error:

---> 20 X_train = pd.DataFrame(X_train_oversampled, columns=X_train.columns)

/anaconda3/lib/python3.7/site-packages/scipy/sparse/base.py in __getattr__(self, attr)
    689             return self.getnnz()
    690         else:
--> 691             raise AttributeError(attr + " not found")
    692 
    693     def transpose(self, axes=None, copy=False):

AttributeError: columns not found

I do not know what I should replace and how to use SMOTE with X_train and y_train. Could you please me how to use it inn the proper order?

1

There are 1 answers

2
Ben Reiniger On BEST ANSWER

You haven't given enough of your code or data, nor the full traceback, to be sure...but the error occurring in the final line indicates that SMOTE is working fine, and the error is because X_train is a sparse array, which do not have column names and hence no attribute columns. It looks like you had column names at some point, so you should be able to retrieve them from df.