I am trying to work out how to use the .NET EntityFramework to generate both readable and natural code and efficient SQL query statements when fetching related entities. For example, given the following code-first definition
public class WidgetContext : DbContext
{
public DbSet<Widget> Widgets { get; set; }
public DbSet<Gizmo> Gizmos { get; set; }
}
public class Widget
{
public virtual int Id { get; set; }
[Index]
[MaxLength(512)]
public virtual string Name { get; set; }
public virtual ICollection<Gizmo> Gizmos { get; set; }
}
public class Gizmo
{
public virtual long Id { get; set; }
[Index]
[MaxLength(512)]
public virtual string Name { get; set; }
public virtual Widget Widget { get; set; }
public virtual int WidgetId { get; set; }
}
I want to be able to write code like
using (var wc = new WidgetContext())
{
var widget = wc.Widgets.First(x => x.Id == 123);
var gizmo = widget.Gizmos.First(x => x.Name == "gizmo 99");
}
and see a SQL query created along the lines of
SELECT TOP (1) * from Gizmos WHERE WidgetId = 123 AND Name = 'gizmo 99'
So that the work of picking the right Gizmo is performed by the database. This is important because in my use case each Widget could have thousands of related Gizmos and in a particular request I only need to retrieve one at a time. Unfortunately the code above causes the EntityFramework to create SQL like this instead
SELECT * from Gizmos WHERE WidgetId = 123
The match on Gizmo.Name is then being performed in memory by scanning the complete set of related Gizmo entities.
After a good deal of experimentation, I have found ways of creating the efficient SQL use I am looking for in the entity framework, but only by using ugly code which is much less natural to write. The example below illustrates this.
using System.Data.Entity;
using System.Data.Entity.Core.Objects.DataClasses;
using System.Linq;
static void Main(string[] args)
{
Database.SetInitializer(new DropCreateDatabaseAlways<WidgetContext>());
using (var wc = new WidgetContext())
{
var widget = new Widget() { Name = "my widget"};
wc.Widgets.Add(widget);
wc.SaveChanges();
}
using (var wc = new WidgetContext())
{
var widget = wc.Widgets.First();
for (int i = 0; i < 1000; i++)
widget.Gizmos.Add(new Gizmo() { Name = string.Format("gizmo {0}", i) });
wc.SaveChanges();
}
using (var wc = new WidgetContext())
{
wc.Database.Log = Console.WriteLine;
var widget = wc.Widgets.First();
Console.WriteLine("=====> Query 1");
// queries all gizmos associated with the widget and then runs the 'First' query in memory. Nice code, ugly database usage
var g1 = widget.Gizmos.First(x => x.Name == "gizmo 99");
Console.WriteLine("=====> Query 2");
// queries on the DB with two terms in the WHERE clause - only pulls one record, good SQL, ugly code
var g2 = ((EntityCollection<Gizmo>) widget.Gizmos).CreateSourceQuery().First(x => x.Name == "gizmo 99");
Console.WriteLine("=====> Query 3");
// queries on the DB with two terms in the WHERE clause - only pulls one record, good SQL, ugly code
var g3 = wc.Gizmos.First(x => x.Name == "gizmo 99" && x.WidgetId == widget.Id);
Console.WriteLine("=====> Query 4");
// queries on the DB with two terms in the WHERE clause - only pulls one record, also good SQL, ugly code
var g4 = wc.Entry(widget).Collection(x => x.Gizmos).Query().First(x => x.Name == "gizmo 99");
}
Console.ReadLine();
}
Query 1 demonstrates the 'fetch everything and filter' approach that is generated by the natural usage of the entity objects.
Queries 2,3 and 4 above all generate what I would consider to be an efficient SQL query - one that returns a single row and has two terms in the WHERE clause, but they all involve very stilted C# code.
Does anyone have a solution that will allow natural C# code to be written and generate efficient SQL utilization in this case?
I should note that I have tried replacing ICollection with EntityCollection in my Widget object to allow the cast to be removed from the Query 2 code above. Unfortunately this leads to an EntityException telling me that
The object could not be added to the EntityCollection or EntityReference. An object that is attached to an ObjectContext cannot be added to an EntityCollection or EntityReference that is not associated with a source object.
when I try to retrieve any related objects.
Any suggestions appreciated.
Ok, further digging has let me get as close as I think is possible to where I want to be (which, to reiterate, is code that looks OO but generates efficient DB usage patterns).
It turns out that Query2 above (casting the related collection to an
EntityCollection
) actually isn't a good solution, since although it generates the desired query type against the database, the mere act of fetching theGizmos
collection from thewidget
is enough to make the entity framework go off to the database and fetch all related Gizmos - i.e. performing the query that I am trying to avoid.However, it's possible to get the
EntityCollection
for a relationship without calling the getter of the collection property, as described here http://blogs.msdn.com/b/alexj/archive/2009/06/08/tip-24-how-to-get-the-objectcontext-from-an-entity.aspx. This approach sidesteps the entity framework fetching related entities when you access theGizmos
collection property.So, an additional read-only property on the Widget can be added like this
and then the calling code can look like this
This approach generates SQL that efficiently fetches only a single row from the database, but depends on the following conditions holding true
.First()
call inGizmosQuery
.DbContext
and the Widget class is eligible for proxy generation (https://msdn.microsoft.com/en-us/library/vstudio/dd468057%28v=vs.100%29.aspx)new Widget()
since these will not be proxies and will not implementIEntityWithRelationships
. New objects that are valid proxies can be created usingwc.Widgets.Create()
instead if necessary.