If you go here: The IOrderedEnumerableDocs and click on the .Contains() method then it takes you to here: the generalised Enumerable.Contains() docs
I take this to mean that it's just using the underlying IEnumerable implementation?
This seems strange given the potential for a more performant search given that you know you have a sorted list that you can compare against your element (e.g. to do a binary search to confirm whether the element is present, rather than enumerating the whole set?
Am I missing anything?
It's worth noting from the beginning that the fact that a given method is only documented as operating on
IEnumerable<T>
does not mean that it isn't optimised for given implementations or derived interfaces. In fact a great many of the methods inEnumerable
take different paths for different derived interfaces and/or concrete implementations. The classic example here is thatCount()
takes a different path if theIEnumerable<T>
it is called on implementsICollection<T>
orICollection
. There are several further examples of this in the full framework, and even more in .NET Core, including some which take optimised paths for the implementation ofIOrderedEnumerable<T>
you get from callingOrderBy()
.Some of which are my doing, because my hobby these days is contributing to .NET Core, particularly to Linq, and particularly performance improvements (though obviously if I'm hacking on something I need to increase tests on the bits I'm touching, and when doing so turns up minor bugs they get prioritised over performance improvements).
When it comes to
IOrderedEnumerable
, I've done things like change.OrderBy(someLambda).Skip(j).Take(k)
(common paging idiom) from O(n log n) time to compute and O(j + k) time to enumerate to O(n + k log k) time to compute and O(k) time to enumerate, and.OrderBy(someLambda).First()
for O(n) space and O(n log n) time to O(1) space and O(n) time, and so on.I might look at improving other methods, and of course if I don't it's quite possible someone else would.
If I do, I would not do it as you suggest.
Firstly, to have a separate overload for
IOrderedEnumerable<T>
would require adding a method to the public API but only cover some cases (maybe what we're given as anIEnumerable<T>
is in fact anIOrderedEnumerable<T>
). Much better to just have an overload forIEnumerable<T>
and detect theIOrderedEnumerable<T>
case.Secondly to use binary search the we would have to know the means by which the
IOrderedEnumerable
was sorted. This is possible with theOrderedEnumerable<TElement, TKey>
created by calls ofOrderBy
but not more generally.Thirdly, it would not be the biggest possible gain.
The current costs of
source.OrderBy(someLambda).Contains(someItem)
are as follows:source
: O(n) space, O(n) time.someItem
, or confirm none exists.: O(n) time.If
Contains()
was optimsed to use binary search it would become:source
: O(n) space, O(n) time.someItem
, or confirm none exists.: O(log n) time (average, O(n) worse because a precise match may sort at the same level as all elements and have to be compared with all of them).However, that's a complete waste. If we want to optimise
Contains()
(and a great many other aggregate methods for that matter) the optimum stragey would be:source.Contains(someItem)
and return the result. This will at worse be O(n) time and O(1) space, though it may be O(log n) or O(1) time ifsource
is for example aHashSet<T>
(a case thatContains()
is already optimised for). In both theory and practice it will end up being faster than the buffering step above.Implementing that change would be considerably less work, and a much bigger gain.
I've considered this, and might indeed submit such a PR, but I'm not yet sure if on balance it's worth it (and hence what my opinion would be if someone else submits such a PR) since it's almost always easier for the the caller to turn
….OrderBy(foo).Contains(bar)
into.Contains(bar)
themselves, and the check needed by optimising for such a case would be cheap, but not entirely free.