I am trying to log several metrics over my unbalanced classification dataset using the MetricTracker
of the torchmetrics
library. I find that the results for Precision
Recall
and F1Score
always equal to the Accuracy
even though it should not.
A minimal example of how to produce this behaviour would look like this:
import torch
import torchmetrics
from torchmetrics import MetricTracker, MetricCollection
from torchmetrics import Accuracy, F1Score, Precision, Recall, CohenKappa
num_classes = 3
list_of_metrics = [Accuracy(task="multiclass", num_classes=num_classes),
F1Score(task="multiclass", num_classes=num_classes),
Precision(task="multiclass",num_classes=num_classes),
Recall(task="multiclass",num_classes=num_classes),
CohenKappa(task="multiclass",num_classes=num_classes)
]
maximize_list=[True,True,True,True,True]
metric_coll = MetricCollection(list_of_metrics)
tracker = MetricTracker(metric_coll, maximize=maximize_list)
pred = torch.Tensor([[0,.1,.5], # 2
[0,.1,.5], # 2
[0,.1,.5], # 2
[0,.1,.5], # 2
[0,.1,.5], # 2
[0.9,.1,.5]]) # 0
label = torch.Tensor([2,2,2,2,2,1])
tracker.increment()
tracker.update(pred, label)
for key, val in tracker.compute_all().items():
print(key,val)
Output:
MulticlassAccuracy tensor([0.8333])
MulticlassF1Score tensor([0.8333])
MulticlassPrecision tensor([0.8333])
MulticlassRecall tensor([0.8333])
MulticlassCohenKappa tensor([0.4545])
Does anyone know what is the problem here and how to fix it?
I use version 0.11.1
of thetorchmetrics
library.
Apparently there's a documentation bug.
The fix is always explicitly to state what 'average' you want.