I have a highly imbalanced dataset (3% Yes, 87% No) of textual documents, containing a title and abstract feature. I have transformed these documents into tf.data.Dataset
entities with padded batches. Now, I am trying to train this dataset using Deep Learning. With model.fit()
in TensorFlow, you have the class_weights
parameter to deal with class imbalance, however, I am seeking for the best parameters using keras-tuner
library. In their hyperparameter tuners, they do not have such an option. Therefore, I am seeking other options for dealing with class imbalance.
Is there an option to use class weights in keras-tuner
? To add, I am already using the precision@recall
metric. I could also try a data resampling method, such as imblearn.over_sampling.SMOTE
, but as this Kaggle post mentions:
It appears that SMOTE does not help improve the results. However, it makes the network learning faster. Moreover, there is one big problem, this method is not compatible larger datasets. You have to apply SMOTE on embedded sentences, which takes way too much memory.
You could change the evaluation metric to fbeta_scorer.(its weighted fscore)
Or if the dataset is large enough, you can try undersampling.