I have a fairly basic SQL query which runs in 1 second without the Data_1 field which is performing a STUFF() and using RN for filter and order. With the Data_1 field in the query the execution goes from 1 second to 25 seconds. If I remove the RN filter and ORDER BY from the STUFF function of Data_1 then it goes back to executing in 1 second with both Data_1 and Data_2 in the query. So the issue seems to be with the RN piece within the STUFF.
Is there anything that can be done to make this run quickly without using temp tables? The same thing works fine with temp tables instead of CTE but the requirement is to have this code in a view.
There are only 350 total entries in the table and the result. Running on MS SQL Server 2016 (13.0.7016.1)
Note: Data_1 field requirement is to show 6 most recent updates per Program in the JSON string but in order from oldest to newest. That's the only reason why I am using ROW_NUMBER because the underlying data can have alot more than 6 updates per Program.
WITH
CTE AS
(SELECT P.Program_Number,
P.Date_Status,
'{"date":"' + P.Date_Status_Display + '","percent":"' + P.Percent_Complete + '","status":":' + P.Status_Overall_Col + '"}' AS JSON_String,
ROW_NUMBER() OVER (PARTITION BY P.Program_Number ORDER BY P.Date_Status DESC) AS RN
FROM dbo.Main_Entries_Table)
SELECT P.[Program_Number],
P.[Program_Name],
'[' + STUFF((SELECT ',' + [JSON_String]
FROM CTE C
WHERE C.Program_Number = P.Program_Number
AND RN <= 6
ORDER BY RN DESC
FOR XML PATH('')),1,1,'') + ']' AS Data_1,
'[' + STUFF((SELECT ',' + [JSON_String]
FROM CTE C
WHERE C.Program_Number = P.Program_Number
ORDER BY Date_Status ASC
FOR XML PATH('')),1,1,'') + ']' AS Data_2,
P.Last_Updated
FROM dbo.Main_Entries_Table P;
First,
STRING_AGG
will not help you here. The optimizer leverages the same tricks to concatenate the string either way. STRING_AGG, however, is cleaner and handles conversions better but it would not solve this problem.Next, for a good answer you should include DDL and sample data like so. This is what I'll use to show you what's up:
Now lets look at your CTE query and the execution plan:
CTE Query Section
Execution plan:
Depending on your data, that can be a big ol' expensive sort. This index will fix that:
This ^^^ is what Itzik Ben-Gan calls a POC Index which stands for Partition, Order, Cover. This index handles the
PARTITION BY
clause first, then the 'ORDER BY' and, because it's clustered it covers all required columns. You would likely have to create a non-clustered index with the correct covering columns.New Execution plan:
Now for your Data_2 column (excluding Data_1):
Execution plan:
Both queries (inside and outside the CTE) leverage the index to eliminate the sort AND to perform a seek against your rows (vs a scan which is slower). Now for your Data_1 column.
Here you will get will get a sort and scan if you include the ORDER BY clause. That said, you don't need it. With the aforementioned index in place, this will be quite fast:
Check out the final plan:
The key here is understanding how to analyze the execution plan data to tune your SQL.