Strange behaviour of OrderBy Linq

907 views Asked by At

I have a list which is ordered using the OrderBy() Linq function, that returns an IOrderedEnumerable.

var testList = myList.OrderBy(obj => obj.ParamName);

The ParamName is an object that can hold integer as well as string. The above orderBy orders the list based on the integer value. Now I am operating a foreach on the testList and changing the ParamName property to some string based on its integer value as follows,

using (var sequenceEnum = testList.GetEnumerator())
{
    while (sequenceEnum.MoveNext())
    {
        sequenceEnum.Current.ParamName = GetStringForInteger(int.Parse(Convert.ToString(sequenceEnum.Current.ParamName)));
    }
}

What has happened next is that the order of the items in the list after the previous loop has been disrupted and the list has been ordered based on the string assigned and not on the initial ordering.

However the ordering is preserved, when I am using .ToList() in conjunction with the .OrderBy() clause.

Can anyone please help me what is happening here?

Sample Output Illustration:

enter image description here

4

There are 4 answers

3
Bruno Belmondo On

Edit: We all got your problem wrong. The reason it is sorting the wrong way is because you are comparing "B" and "AA" and expecting AA to be after B like in excel which of course will not happen in an alphabetical order.

Specify an explicit comparator while ordering or transform the ParamName into Int before doing the order by.


The reason why Linq is usually returning IEnumerable elements is that it has a lazy evaluation behaviour. This means that it will evaluate the result when you need it, not when you build it.

Calling the ToList forces linq to evaluate the result in order to generate the expected list.

TL;DR be very carefull when doing linq queries and altering the source data set before fetching the result.

0
Sergy93 On

The reason is the detached execution of queries in EF, this means that the actual query to DB is not made until you explicitly load it in memory, via .ToList() for example.

As you well said .OrderBy() returns an IOrderedEnumerable, which works with the foreach idiom. So why not simplify it do something like the following?

foreach(var item in testList)
{
       item.ParamName = GetStringForInteger(int.Parse(Convert.ToString(item.ParamName)));
}
0
Ghasan غسان On

As everyone here has mentioned, that is because Linq is lazily evaluated. You can read more here: https://blogs.msdn.microsoft.com/ericwhite/2006/10/04/lazy-evaluation-and-in-contrast-eager-evaluation/

What you want to do is probably this:

var testList = myList.OrderBy(obj => obj.ParamName).Select(obj =>
{
    obj.ParamName = GetStringForInteger(int.Parse(Convert.ToString(obj.ParamName)));
    return obj;
});
2
Harald Coppoolse On

An IEnumerable object does not represent a sequence of objects itself, it represents the algorithm needed to give you upon request the first element of the sequence as "current element" , and to give you the next element after the current element.

When linq was invented, it was decided that linq uses the concept of deferred execution, quite often called lazy evaluation. In the MSDN description of Enumerable functions that use deferred execution you will find the following phrase:

This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach.

If you create the IEnumerable, and change the objects on which the IEnumerable object acts, this change might influence the result. It is comparable to a function that returns a different value if the parameters on which the function acts are changed:

int x = 4;
int y = 5;
int MyFunction()
{
    return x + y;
}

int a = MyFunction();
y = 7;
int b = MyFunction();

Now b does not equal a. Similar to your IEnumerable:

List<...> myList = CreateMySequence()
var IEnumerable<...> myOrder = myList.OrderBy(...);

myOrder does not contain the result, but is like a function that can calculate the result for it. If you change one of the parameters that myOrder uses, the result might change:

myList.Add(someElement);
var myResult = myOrder.ToList();

myResult has changed, because you changed the function.

The reason that deferred execution was invented, is because quite often you don't need to enumerate over all elements of the sequence. In the following cases it would be a wast of processing time if you'd create the complete sequence:

  • I want only the first element,
  • I want to skip 3 elements and then take two elements,
  • I want the first element with a value of x
  • I want to know if the sequence contains any element at all

Of course there are functions that need to create the complete sequence as soon as you ask for the first element:

  • If you want the first in a sorted sequence, all elements have to be sorted in order to find the first one.
  • If you want the first element of a group of elements where all elements in the group have the same value of a certain property X (Enumerable.GroupBy)

As a rule of thumb it is wise to keep all sequences as IEnumerable as long as possible until you either need the results, or until the sources that are used to create the sequence are changed.

This latter is important when fetching data from a database, from a file, from the internet: you'll have to create the sequence before your connection is closed.

The following wont't work

using (var myDbContext = new MyDbContext)
{
    return MyDbContext.Customers.Where(customer => customer.Age > 18);
}

The database query is not executed before you Disposed myDbContext when leaving the using statement. Therefore you'll get an exception as soon as you ask for any element in the sequence.