Entity Framework - Linq : how query to take 2 first ordered and grouped list

3.5k views Asked by At

I have a large table with 6000000 record like this format(Acc,sDate,Serial,Amount,...) Acc,date,serial is PKey.

To show my problem, created small code

public class Cheque 
{
    public string Account{ get; set; }
    public string Serial{ get; set; }   
    public string StartDate { get; set; }
    // ... public string Amount { get; set; }    ...
}

var list = new List<Cheque>();
list.Add(new Cheque() { Account= "1", Serial = "1", StartDate = "20080120"});
list.Add(new Cheque() { Account= "1", Serial= "2", StartDate = "20080120" });
list.Add(new Cheque() { Account= "1", Serial= "3", StartDate = "20080120" }); 
list.Add(new Cheque() { Account= "1", Serial= "4", StartDate = "20080120" }); 
// each acc have 100 to 300 record per date ,for simplicity 3 obj added

list.Add(new Cheque() { Account= "1", Serial= "1", StartDate = "20110120" });
list.Add(new Cheque() { Account= "1", Serial= "2", StartDate = "20110120" });

list.Add(new Cheque() { Account= "1", Serial= "1", StartDate = "20120120" });
list.Add(new Cheque() { Account= "1", Serial= "2", StartDate = "20120120" });
list.Add(new Cheque() { Account= "1", Serial= "3", StartDate = "20120120" });

list.Add(new Cheque() { Account= "2", Serial= "1", StartDate = "20100417" });
list.Add(new Cheque() { Account= "2", Serial= "2", StartDate = "20100417" });

list.Add(new Cheque() { Account= "2", Serial= "1", StartDate = "20120314" });

list.Add(new Cheque() { Account= "2", Serial= "1", StartDate = "20070301" });
list.Add(new Cheque() { Account= "2", Serial= "1", StartDate = "20070301" });
list.Add(new Cheque() { Account= "2", Serial= "1", StartDate = "20070301" });

expected list only tow set with nearest date from each Account

Acc Serial Date

"1", "1", "20120120"   //first resultSet with Account= 1 
"1", "2", "20120120" 
"1", "3", "20120120"
"1", "1", "20110120"  //second resultset with Account=  1 
"1", "2", "20110120" 
"2", "1", "20120314"  //first resultSet with Account= 2 
"2", "1", "20100417" //second resultset with Account=  2 
"2", "2", "20100417" 

plz help me how can query this with linq how group by (or distinct ) and take tow first set, like this

4

There are 4 answers

9
Gert Arnold On

The trick is to group by Account and Serial. Take the top two dates and then flatten the list again by SelectMany:

list.GroupBy(x => new {x.Account, x.Serial})
.Select(g => new { FirstTwo = g
                   .GroupBy(x => x.StartDate).Select(x => x.FirstOrDefault())
                   .OrderByDescending(x => x.StartDate).Take(2)
                 })
.SelectMany(g => g.FirstTwo)
.OrderBy(x => x.Account)
    .ThenByDescending(x => x.StartDate)
        .ThenBy(x => x.Serial)

Result:

1   1   20120120
1   2   20120120
1   3   20120120
1   1   20110120
1   2   20110120
1   3   20110120
2   1   20120314
2   2   20120314
2   1   20100417
2   2   20100417
0
AliKarimi On

After searching and reading stackoverflow, make desired result with this code.

    var groupedList = (from t in list
                       group t by new { t.Account, t.StartDate } into g
                       select new
                       {                              
                          g.Key.Account,
                          g.Key.StartDate
                        });

    var filteredList = groupedList.GroupBy(x => x.Account)
            .SelectMany(g => (from t in g orderby t.StartDate descending select t)
                     .Take(2).ToList() );

    var Result = (from c in list
                  join k in filteredList on
                  new { c.StartDate, c.Account } equals
                  new { k.StartDate, k.Account } //into j
                  select c).ToList();
        /*  or lambda method chain
        var groupedList = list.GroupBy(t => new {t.StartDate, t.Account})
            .Select(g => new { g.Key.StartDate,g.Key.Account})
            .GroupBy(x => x.Account)
            .SelectMany(g => (from t in g orderby t.StartDate descending select t)
                        .Take(2).ToList() );

          var result = (list.Join(inner: groupedList, 
            outerKeySelector: c => new {c.StartDate, c.Account}, 
            innerKeySelector: k => new {k.StartDate, k.Account},
            resultSelector: (c, k) => c))
            .OrderByDescending(e =>e.StartDate).OrderBy(e =>e.Account).ToList(); 

        */

      Console.WriteLine(Result);  

Thanks a lot LINQPAD(the best tool for linq) and all friends in stackoverflow (the best and professional developers in the world)

but i guess my code is very complex (3 level filtering) and have not best performance. :)

who have a better offer, please let me know.

I'd love to get some improvements!

3
Incredible On

In order to get the top two from the group the query would be like this: Update But in this case, the combination of the Account ID and the Start Date must be unique.

.ToList().GroupBy(x=>new{x.Account,x.StartDate}).SelectMany(y=>y.OrderByDescending(z=>z.StartDate).Take(2));

I am using the similar one in my code and know this works fine.

0
AliKarimi On

At last i find one statement which produce desired result.

var result = (from cheque in list.OrderBy(a => a.Account).ThenByDescending(a => a.StartDate)
                            group cheque by new { cheque.Account, cheque.StartDate } into gr
                            //from ids in gr
                            group gr by gr.Key.Account into secondGrouping
                            from second in secondGrouping.Distinct().Take(2)
                                  from Cheque f in second
                                  select f 
                            ).ToList<Cheque>();