I am new to sentiment analysis and am trying to train a model using Python language and Jupyter Notebook. I mostly learnt from online sources and ChatGPT. In this school project, I am doing multilabel classification with a CNN model, bidirectional RNN model and RNN LSTM model to see which can perform the best. I have a csv file of 2000 reviews for training and there are 6 labels. There is a major class imbalance as seen below: Sad - 933 Wholesome - 895 Funny - 808 Happy - 661 Exciting - 571 Nostalgic - 289
Here is what goes on for every model:
- Load dataset and preprocess reviews (remove symbols, make into lowercase).
- Split the data into training and testing sets.
- Tokenize reviews and use GloVe pre-trained word embeddings.
- Convert text data to sequences and pad sequences.
- Encode labels using MultiLabelBinarizer.
- Define NN layers.
- Train model.
- Evaluate model.
Issue 1: Exact same accuracy every epoch Usually when I train the model, the accuracy tends to fluctuate along with the training loss value. However, recently, the accuracy value doesn't change at all. Is there any explanation to why this happened? The only thing that changed was the size of the dataset, from 1000 reviews to 2000 reviews.
Issue 2:
0 values for precision, recall and F1 score for some labels
0 values for precision, recall and F1 score for all labels
I am aware on how accuracy, precision, recall and F1 score are calculated but was surprised on why and how I got 0 values from time to time. Sometimes it happens to all the labels, sometimes it happens to a few only regardless of the model. Any explanation on this? I try to use less layers in the NN models, small epoch and small batch size because of my small dataset. Maybe I will completely discard the nostalgic label to get rid of the class imbalance.
I tried expanding the dataset because 1000 reviews are considered too little but 2000 reviews seem to not be doing the job either. Is it still too little? My goal is 5000 reviews.