SQL query performance degradation after encrypting column

798 views Asked by At

I have a .net framework 4.7.2 app with entity framework 6.1.2. The app uses an azure sql database which has -among others- a table with encrypted data. We have used the Always Encrypt feature of sql server in order to encrypt these data. The database schema is shown below.

enter image description here

The field Description in table Order is nvarchar(100) and the field Description in Note is nvarchar(100) and encrypted with Always Encrypt (deterministic encryption). Both columns are nullable and have a non-clustered index on them. Both tables have a few tens of millions of records.

We have the following query in our code: var user = DbContext.Set<User>().FirstOrDefault(u => u.Notes.Any(n => n.Description == value)); This query used to run with with normal performance cost (few dozens msecs) before encrypting the field in our db. This completely changed after the encryption. The execution time changed to a few tens of secs.

I rebuilt all indexes but it changed nothing.The above code statement is translated from entity framework into something like:

DECLARE @p__linq__0 nvarchar(100) = 'some text'
SELECT *
FROM  [User]
WHERE  EXISTS (SELECT 1 AS [C1]
                FROM [Note] 
                WHERE ([User].[Id] = [Note].[UserId]) AND (([Note].[Description] = @p__linq__0) OR (([Note].[Description] IS NULL) AND (@p__linq__0 IS NULL))) )

Its execution plan in the database is included below. From this plan it is obvious that the index in Description is not used and a Clustered Index Scan is performed. That's why the performance of the query is poor.

enter image description here

The tricky part is that if we remove the OR (([Note].[Description] IS NULL) AND (@p__linq__0 IS NULL))) part of the where clause the execution of the query happens instantly and the execution plan is the expected one (see below)

DECLARE @p__linq__0 nvarchar(100) = 'some text'
SELECT *
FROM  [User]
WHERE  EXISTS (SELECT 1 AS [C1]
                FROM [Note] 
                WHERE ([User].[Id] = [Note].[UserId]) AND ([Note].[Description] = @p__linq__0) )

enter image description here

What is most interesting is that if we remove the ([Note].[Description] IS NULL) !!! part of the where clause the performance degrades again in its poor state and the execution plan of the query is the previous one.

DECLARE @p__linq__0 nvarchar(100) = 'some text'
SELECT *
FROM  [User]
WHERE  EXISTS (SELECT 1 AS [C1]
                FROM [Note] 
                WHERE ([User].[Id] = [Note].[UserId]) AND (([Note].[Description] = @p__linq__0) OR ((@p__linq__0 IS NULL))) )
            

If we do the same exact queries to towards the Order table which has exactly the same schema as Note but its Description field is not encrypted, the performance and the execution plans as expected in all cases.

I have seen the below relevant questions but they do not refer to encrypted columns. They refer to default entity framework behavior. 1 2

Thus, the question is: What is the impact of data encryption (with always encrypted) in the above case and how could we overcome it?

3

There are 3 answers

0
Manolis On BEST ANSWER

Finally the issue had nothing to do with data encryption. The actual issue came from the fact that the statistics of the database were out of date after a big data migration. We updated them using EXEC sp_updatestats and after that the query was executed using the expected execution plan and with expected performance.

1
David Browne - Microsoft On

This is how EF replicates C# null comparison semantics in SQL. This is optional, and I always turn it off.

Gets or sets a value indicating whether database null semantics are exhibited when comparing two operands, both of which are potentially nullable. The default value is false.

For example (operand1 == operand2) will be translated as:

(operand1 = operand2)

if UseDatabaseNullSemantics is true, respectively

(((operand1 = operand2) AND (NOT (operand1 IS NULL OR operand2 IS NULL))) OR ((operand1 IS NULL) AND (operand2 IS NULL)))

DbContextConfiguration.UseDatabaseNullSemantics

In your DbContext constructor simply set:

this.Configuration.UseDatabaseNullSemantics = true;

In EF Core, the configuration is done on the OptionsBuilder, eg

optionsBuilder.UseSqlServer(constr, o => o.UseRelationalNulls());
1
Panagiotis Kanavos On

I suspect this has nothing to do with encryption, and everything to do with this catch-all clause :

AND (([Note].[Description] = @p__linq__0) OR ((@p__linq__0 IS NULL)))

Such catch-all clauses are a common but unfortunate way of trying to use "optional" parameters in stored procedures. Unfortunately, they cause performance issues just like this one. The server caches the execution plan the first time a query runs and reuses it in subsequent calls. The execution plan for a query that doesn't need to search a specific column though will be very different from one that does.

If the first call used null for the parameter, the generated execution plan wouldn't use any indexes that covered that column - why should it?

There's no need for such tricks when using ORMs like EF, especially when LINQ is involved.

I suspect the EF query (that wasn't posted) contains a Where call like this :

.Where(note=> note.Description == text || text ==null);

You can chain LINQ calls which means that you can add a condition only if it's needed . The catch-all call can be replaced with :

if (text != null)
{
    query=query.Where(note => note.Description == text);
}