I'm working on Multiclass text classification.Part of this we identified 23 labels(categories) for 14 months of data. out of this 12 months are used for training and fitting the modle. During the model the first 12 months data divided into 80-20 Train-test using test-train-split.

used SVM,random forest,NB in all scenarios have got 70% accuracy. this looks pretty good.

however while predicting the 13 and 14th month data(which is also classified manually), the predictons not even close to 10%.

how can this be handled?

0 Answers