Consider the following simple code with LINQ OrderBy
and ThenBy
:
static void Main()
{
var arr1 = new[] { "Alpha", "Bravo", "Charlie", };
var coStr = Comparer<string>.Create((x, y) =>
{
Console.WriteLine($"Strings: {x} versus {y}");
return string.CompareOrdinal(x, y);
});
arr1.OrderBy(x => x, coStr).ToList();
Console.WriteLine("--");
var arr2 = new[]
{
new { P = "Alpha", Q = 7, },
new { P = "Bravo", Q = 9, },
new { P = "Charlie", Q = 13, },
};
var coInt = Comparer<int>.Create((x, y) =>
{
Console.WriteLine($"Ints: {x} versus {y}");
return x.CompareTo(y);
});
arr2.OrderBy(x => x.P, coStr).ThenBy(x => x.Q, coInt).ToList();
}
This simply uses some comparers that write out to the console what they compare.
On my hardware and version of the Framework (.NET 4.6.2), this is the output:
Strings: Bravo versus Alpha Strings: Bravo versus Bravo Strings: Bravo versus Charlie Strings: Bravo versus Bravo -- Strings: Bravo versus Alpha Strings: Bravo versus Bravo Ints: 9 versus 9 Strings: Bravo versus Charlie Strings: Bravo versus Bravo Ints: 9 versus 9
My question is: Why would they compare an item from the query to itself?
In the first case, before the --
separator, they do four comparisons. Two of them compare an entry to itself ("Strings: Bravo versus Bravo"). Why?
In the second case, there should not ever be a need for resorting to comparing the Q
properties (integers); for there are no duplicates (wrt. ordinal comparison) in the P
values, so no tie-breaking from ThenBy
should be needed ever. Still we see "Ints: 9 versus 9" twice. Why use the ThenBy
comparer with identical arguments?
Note: Any comparer has to return 0
upon comparing something to itself. So unless the algorithm just wants to check if we implemented a comparer correctly (which it will never be able to do fully anyway), what is going on?
Be aware: There are no duplicates in the elements yielded by the queries in my examples.
I saw the same issue with another example with more entries yielded from the query. Above I just give a small example. This happens with an even number of elements yielded, as well.
Efficiency. Sure it would be possible to check the object isn't itself first, but that means doing an extra operation on every comparison to avoid a case that comes up relatively rarely and which in most cases is pretty cheap (most comparers are). That would be a nett loss.
(Incidentally, I did experiment with just such a change to the algorithm, and when measured it really was an efficiency loss with common comparisons such as the default
int
comparer).Who is to say there are no duplicates? The internal comparison was given two things (not necessarily reference types, so short-circuiting on reference identity isn't always an option) and has two rules to follow. The first rule needed a tie-break so the tie-break was done.
The code is designed to work for cases where there can be equivalent values for the first comparison.
If it's known that there won't be equivalent values for the
OrderBy
then it's for the person who knows that to not use an unnecessaryThenBy
, as they are the person who can potentially know that.