Imagine we have three classes: A, B, and C, and we classify a document 'd' using a standard MaxEnt classifier, and come up with the following probabilities:
P(d, A) = 0.50
P(d, B) = 0.25
P(d, C) = 0.25
I feel like that is very different, in a way, from this set of probabilities:
P(d, A) = 0.50
P(d, B) = 0.49
P(d, C) = 0.01
Is there a way to score the difference between these two?
The problem you are facing is often called the "consensus" among classifiers. As multilabel MaxEnt can be seen as N independent classifiers, you can think about it as a group of models "voting" for different classes.
Now, there are many measures of calculating such "consensus", including:
In general you should think about methods od detecting "uniformity" of the resulting distribution (impling less confident decison) or "spikeness" (indicating more confident classification).