Are oversampling and undersampling approaches good to build good models?

93 views Asked by Jack Froster At 06 May 2021 at 07:55

I just worked on "Heart Failure Prediction" dataset from kaggle ( https://www.kaggle.com/andrewmvd/heart-failure-clinical-data )

And i noticed the number of "Not dead" were more then the number of "dead" so i used SMOTETomek, which resampled my data and i predicted the accuracy and printed the confusion matrix, which had pretty good results then before.

df.DEATH_EVENT.value_counts()

0    202
1     95
Name: DEATH_EVENT, dtype: int64

accuracy and confusion matrix: before

0.7888888888888889
[[130  30]
[  8  12]]

the convertion code:

smt = SMOTETomek(random_state=42)
X_res,y_res = smt.fit_resample(X,y)
pd.DataFrame(y_res)['DEATH_EVENT'].value_counts()

1    155
0    155
Name: DEATH_EVENT, dtype: int64

accuracy and confusion matrix: after

0.912
[[53  5]
[ 6 61]]

but this was a small sample.

From your experience does using oversampling or undersampling approaches lead to better results in general? or do we get some kind of false results because of it and our model won't perform just as good in real world?

Original Q&A

TechQA.

Are oversampling and undersampling approaches good to build good models?

There are 0 answers

Related Questions in PYTHON

Related Questions in DATA-SCIENCE

Related Questions in KAGGLE

Related Questions in IMBALANCED-DATA

Related Questions in OVERSAMPLING

Popular Questions

Popular Tags

Trending Questions