Let's say I let a classification model classify a single object multiple times but under varying circumstances. Ideally it should predict the same class again and again. But in reality its class predictions may vary.
So given a sequence of class predictions for the single object, I'd like to measure how consistent the sequence is. To be clear, this is not about comparing predictions against some ground truth. This is about consistency within the prediction sequence itself.
- For instance, a perfectly consistent prediction sequence like
class_a, class_a, class_a, class_ashould get a perfect score. - A less consistent sequence like
class_a, class_b, class_a, class_cshould get a lower score. - And a completely inconsistent sequence like
class_a, class_b, class_c, class_dshould get the lowest score possible.
The goal is to find out on what objects we may need to keep training the classification model. If the classification model is not very consistent in its predictions for a certain object, then we might need to add that object to a dataset for further training.
Preferably it works for any number of possible classes and also takes into account prediction confidences. The sequence class_a (0.9), class_b (0.9), class_a (0.9), class_c (0.9) should give a lower score then class_a (0.9), class_b (0.2), class_a (0.8), class_c (0.3), as it's no good when the predictions are inconsistent with high confidences.
I could build something myself, but I'd like to know if there's a standard sklearn or scipy (or similar) function for this? Thanks in advance!
The comment to this question suggests Spearman's correlation coefficient or the Kandell correlation coefficient. I'll look into that as well.
Not sure if it's what you are looking for :
Example :