I have a .net framework 4.7.2 app with entity framework 6.1.2. The app uses an azure sql database which has -among others- a table with encrypted data. We have used the Always Encrypt feature of sql server in order to encrypt these data. The database schema is shown below.
The field Description in table Order is nvarchar(100) and the field Description in Note is nvarchar(100) and encrypted with Always Encrypt (deterministic encryption). Both columns are nullable and have a non-clustered index on them. Both tables have a few tens of millions of records.
We have the following query in our code: var user = DbContext.Set<User>().FirstOrDefault(u => u.Notes.Any(n => n.Description == value));
This query used to run with with normal performance cost (few dozens msecs) before
encrypting the field in our db. This completely changed after the encryption. The execution time changed to a few tens of secs.
I rebuilt all indexes but it changed nothing.The above code statement is translated from entity framework into something like:
DECLARE @p__linq__0 nvarchar(100) = 'some text'
SELECT *
FROM [User]
WHERE EXISTS (SELECT 1 AS [C1]
FROM [Note]
WHERE ([User].[Id] = [Note].[UserId]) AND (([Note].[Description] = @p__linq__0) OR (([Note].[Description] IS NULL) AND (@p__linq__0 IS NULL))) )
Its execution plan in the database is included below. From this plan it is obvious that the index in Description is not used and a Clustered Index Scan is performed. That's why the performance of the query is poor.
The tricky part is that if we remove the OR (([Note].[Description] IS NULL) AND (@p__linq__0 IS NULL)))
part of the where clause the execution of the query happens instantly and the execution plan is the expected one (see below)
DECLARE @p__linq__0 nvarchar(100) = 'some text'
SELECT *
FROM [User]
WHERE EXISTS (SELECT 1 AS [C1]
FROM [Note]
WHERE ([User].[Id] = [Note].[UserId]) AND ([Note].[Description] = @p__linq__0) )
What is most interesting is that if we remove the ([Note].[Description] IS NULL)
!!! part of the where clause the performance degrades again in its poor state and the execution plan of the query is the previous one.
DECLARE @p__linq__0 nvarchar(100) = 'some text'
SELECT *
FROM [User]
WHERE EXISTS (SELECT 1 AS [C1]
FROM [Note]
WHERE ([User].[Id] = [Note].[UserId]) AND (([Note].[Description] = @p__linq__0) OR ((@p__linq__0 IS NULL))) )
If we do the same exact queries to towards the Order table which has exactly the same schema as Note but its Description field is not encrypted, the performance and the execution plans as expected in all cases.
I have seen the below relevant questions but they do not refer to encrypted columns. They refer to default entity framework behavior. 1 2
Thus, the question is: What is the impact of data encryption (with always encrypted) in the above case and how could we overcome it?
Finally the issue had nothing to do with data encryption. The actual issue came from the fact that the statistics of the database were out of date after a big data migration. We updated them using
EXEC sp_updatestats
and after that the query was executed using the expected execution plan and with expected performance.