How do you make EntityFramework generate efficient SQL queries for related objects?

225 views Asked by At

I am trying to work out how to use the .NET EntityFramework to generate both readable and natural code and efficient SQL query statements when fetching related entities. For example, given the following code-first definition

public class WidgetContext : DbContext
{
    public DbSet<Widget> Widgets { get; set; }
    public DbSet<Gizmo> Gizmos { get; set; }
}

public class Widget
{
    public virtual int Id { get; set; }
    [Index]
    [MaxLength(512)]
    public virtual string Name { get; set; }
    public virtual ICollection<Gizmo> Gizmos { get; set; }
}

public class Gizmo
{
    public virtual long Id { get; set; }
    [Index]
    [MaxLength(512)]
    public virtual string Name { get; set; }
    public virtual Widget Widget { get; set; }
    public virtual int WidgetId { get; set; }

}

I want to be able to write code like

using (var wc = new WidgetContext())
{
    var widget = wc.Widgets.First(x => x.Id == 123);
    var gizmo = widget.Gizmos.First(x => x.Name == "gizmo 99");
}

and see a SQL query created along the lines of

SELECT TOP (1) * from Gizmos WHERE WidgetId = 123 AND Name = 'gizmo 99'

So that the work of picking the right Gizmo is performed by the database. This is important because in my use case each Widget could have thousands of related Gizmos and in a particular request I only need to retrieve one at a time. Unfortunately the code above causes the EntityFramework to create SQL like this instead

SELECT * from Gizmos WHERE WidgetId = 123

The match on Gizmo.Name is then being performed in memory by scanning the complete set of related Gizmo entities.

After a good deal of experimentation, I have found ways of creating the efficient SQL use I am looking for in the entity framework, but only by using ugly code which is much less natural to write. The example below illustrates this.

using System.Data.Entity;
using System.Data.Entity.Core.Objects.DataClasses;
using System.Linq;
static void Main(string[] args)
{
    Database.SetInitializer(new DropCreateDatabaseAlways<WidgetContext>());

    using (var wc = new WidgetContext())
    {
        var widget = new Widget() { Name = "my widget"};
        wc.Widgets.Add(widget);
        wc.SaveChanges();
    }

    using (var wc = new WidgetContext())
    {
        var widget = wc.Widgets.First();
        for (int i = 0; i < 1000; i++)
            widget.Gizmos.Add(new Gizmo() { Name = string.Format("gizmo {0}", i) });
        wc.SaveChanges();
    }

    using (var wc = new WidgetContext())
    {
        wc.Database.Log = Console.WriteLine;
        var widget = wc.Widgets.First();

        Console.WriteLine("=====> Query 1");
        // queries all gizmos associated with the widget and then runs the 'First' query in memory. Nice code, ugly database usage
        var g1 = widget.Gizmos.First(x => x.Name == "gizmo 99");

        Console.WriteLine("=====> Query 2");
        // queries on the DB with two terms in the WHERE clause - only pulls one record, good SQL, ugly code
        var g2 = ((EntityCollection<Gizmo>) widget.Gizmos).CreateSourceQuery().First(x => x.Name == "gizmo 99");

        Console.WriteLine("=====> Query 3");
        // queries on the DB with two terms in the WHERE clause - only pulls one record, good SQL, ugly code
        var g3 = wc.Gizmos.First(x => x.Name == "gizmo 99" && x.WidgetId == widget.Id);

        Console.WriteLine("=====> Query 4");
        // queries on the DB with two terms in the WHERE clause - only pulls one record, also good SQL, ugly code
        var g4 = wc.Entry(widget).Collection(x => x.Gizmos).Query().First(x => x.Name == "gizmo 99");
    }

    Console.ReadLine();
}

Query 1 demonstrates the 'fetch everything and filter' approach that is generated by the natural usage of the entity objects.

Queries 2,3 and 4 above all generate what I would consider to be an efficient SQL query - one that returns a single row and has two terms in the WHERE clause, but they all involve very stilted C# code.

Does anyone have a solution that will allow natural C# code to be written and generate efficient SQL utilization in this case?

I should note that I have tried replacing ICollection with EntityCollection in my Widget object to allow the cast to be removed from the Query 2 code above. Unfortunately this leads to an EntityException telling me that

The object could not be added to the EntityCollection or EntityReference. An object that is attached to an ObjectContext cannot be added to an EntityCollection or EntityReference that is not associated with a source object.

when I try to retrieve any related objects.

Any suggestions appreciated.

1

There are 1 answers

0
RobinG On

Ok, further digging has let me get as close as I think is possible to where I want to be (which, to reiterate, is code that looks OO but generates efficient DB usage patterns).

It turns out that Query2 above (casting the related collection to an EntityCollection) actually isn't a good solution, since although it generates the desired query type against the database, the mere act of fetching the Gizmos collection from the widget is enough to make the entity framework go off to the database and fetch all related Gizmos - i.e. performing the query that I am trying to avoid.

However, it's possible to get the EntityCollection for a relationship without calling the getter of the collection property, as described here http://blogs.msdn.com/b/alexj/archive/2009/06/08/tip-24-how-to-get-the-objectcontext-from-an-entity.aspx. This approach sidesteps the entity framework fetching related entities when you access the Gizmos collection property.

So, an additional read-only property on the Widget can be added like this

 public IQueryable<Gizmo> GizmosQuery
 {
     get
     {
         var relationshipManager = ((IEntityWithRelationships)this).RelationshipManager;
         return (IQueryable<Gizmo>) relationshipManager.GetAllRelatedEnds().First( x => x is EntityCollection<Gizmo>).CreateSourceQuery();
     }
 }

and then the calling code can look like this

var g1 = widget.GizmosQuery.First(x => x.Name == "gizmo 99");

This approach generates SQL that efficiently fetches only a single row from the database, but depends on the following conditions holding true

  • Only one relationship from the source to the target type. Having multiple relationships linking a Widget to Gizmos would mean a more complicated predicate would be needed in the .First() call in GizmosQuery.
  • Proxy creation is enabled for the DbContext and the Widget class is eligible for proxy generation (https://msdn.microsoft.com/en-us/library/vstudio/dd468057%28v=vs.100%29.aspx)
  • The GizmosQuery property must not be called on objects that are newly created using new Widget() since these will not be proxies and will not implement IEntityWithRelationships. New objects that are valid proxies can be created using wc.Widgets.Create() instead if necessary.