SQL: Ignore irrelevant JOIN when aggregating over the core table

80 views Asked by At

Suppose I have Tables:

dbo.Purchases
Id | Value | UserId
1 | 10.00 | 3
2 | 1.00 | 1
3 | 15.50 | 2
4 | 13.40 | 1

dbo.Users
Id (UQ) | Name
1 | Bob
2 | Sarah
3 | Alex

And a VIEW:

dbo.PurchasesWithUsers

SELECT *
  FROM dbo.Purchases
  LEFT JOIN dbo.Users ON Users.Id = UserId

And I'm going to run SELECT SUM(Value) FROM dbo.PurchasesWithUsers.

Now ... as a human, I can see that that JOIN doesn't affect that query:

  • It's obviously not directly used in the SUM.
  • It's a LEFT JOIN so it can't exclude Purchase rows.
  • It's joining to a column with a UQ constraint so it can't duplicate Purchase rows.

But when I run the query and look at the execution plan, the Engine (MS SQL Server) is still performing the JOIN, which degrades the performance :(.

Is there any way that I can give the engine additional clues that it can work out that it could completely skip the JOIN, whilst still using the VIEW as the thing I'm querying?


Context:

  • Obviously the tables are huge which is why the performance impact is material
  • The Tables and the View are obviously a little more complex than that, but not actually all that much - the logical simplification is still valid, and the UQ constraints are explicit (as either UQ CONSTRAINTs or UQ indexes).
  • The VIEW is being used so that the users can filter on a variety of different options. The Data API process those options and applies the relevant WHERE clauses to a single VIEW. Alas that means various of the JOINs aren't relevant to the VIEW depending on what filters have been chosen :(
  • I'm aware that I could materialise and directly index the VIEW, but I'd prefer to avoid that if possible, given that I can see that a simpler query plan could logically exist already.
1

There are 1 answers

1
D A On

Unless indexed, a view does not exist as a stored set of data values in a database. The rows and columns of data come from tables referenced in the query defining the view and are produced dynamically when the view is referenced. Because you don't want to "directly index the VIEW", there is no other data to work with and the server is getting the data by using the query behind the view so there is no work around to avoid that LEFT JOIN with this approch.