SQL Query slow only when Row_Number values are used in STUFF

Question

SQL Query slow only when Row_Number values are used in STUFF

336 views Asked by JetRocket11 At 19 January 2023 at 14:26

I have a fairly basic SQL query which runs in 1 second without the Data_1 field which is performing a STUFF() and using RN for filter and order. With the Data_1 field in the query the execution goes from 1 second to 25 seconds. If I remove the RN filter and ORDER BY from the STUFF function of Data_1 then it goes back to executing in 1 second with both Data_1 and Data_2 in the query. So the issue seems to be with the RN piece within the STUFF.

Is there anything that can be done to make this run quickly without using temp tables? The same thing works fine with temp tables instead of CTE but the requirement is to have this code in a view.

There are only 350 total entries in the table and the result. Running on MS SQL Server 2016 (13.0.7016.1)

Note: Data_1 field requirement is to show 6 most recent updates per Program in the JSON string but in order from oldest to newest. That's the only reason why I am using ROW_NUMBER because the underlying data can have alot more than 6 updates per Program.

WITH
CTE AS
    (SELECT P.Program_Number,
            P.Date_Status,
            '{"date":"' + P.Date_Status_Display + '","percent":"' + P.Percent_Complete + '","status":":' + P.Status_Overall_Col + '"}' AS JSON_String,
            ROW_NUMBER() OVER (PARTITION BY P.Program_Number ORDER BY P.Date_Status DESC) AS RN
     FROM dbo.Main_Entries_Table)
SELECT P.[Program_Number],
       P.[Program_Name],
       '[' + STUFF((SELECT ',' + [JSON_String]
                    FROM CTE C
                    WHERE C.Program_Number = P.Program_Number
                      AND RN <= 6
                    ORDER BY RN DESC
                   FOR XML PATH('')),1,1,'') + ']' AS Data_1,
       '[' + STUFF((SELECT ',' + [JSON_String]
                    FROM CTE C
                    WHERE C.Program_Number = P.Program_Number
                    ORDER BY Date_Status ASC
                   FOR XML PATH('')),1,1,'') + ']' AS Data_2,
       P.Last_Updated
FROM dbo.Main_Entries_Table P;

Original Q&A

There are 1 answers

**Bernie156** · Answer 1 · 2023-01-19T16:01:04+00:00

First, STRING_AGG will not help you here. The optimizer leverages the same tricks to concatenate the string either way. STRING_AGG, however, is cleaner and handles conversions better but it would not solve this problem.

Next, for a good answer you should include DDL and sample data like so. This is what I'll use to show you what's up:

IF OBJECT_ID('tempdb..#Main_Entries_Table') IS NOT NULL DROP TABLE #Main_Entries_Table;

CREATE TABLE #Main_Entries_Table
(
  Program_Number      INT,
  [Program_Name]      VARCHAR(20),
  Date_Status         INT,
  Status_Overall_Col  INT,
  Percent_Complete    DECIMAL(4,2),
  Date_Status_Display VARCHAR(10)
);

INSERT #Main_Entries_Table
VALUES(1,'ABC',1,10,.1,'Yay!'),(1,'ABC',1,40,.95,'blah'),(1,'XYZ',0,10,.03,'NA'),
      (1,'ABC',3,44,.2,'Booo'),(1,'ABC',1,33,.35,'blah'),(1,'XYZ',0,999,.73,'NA'),
      (2,'RRR',1,10,.1,'Booo'),(2,'RRR',1,90,.44,'blah'),(2,'RRR',0,10,.03,'NA'),
      (2,'RRR',3,44,.2,'Booo'),(2,'RRR',1,93,.44,'blah'),(2,'RRR',0,55,.73,'NA');

Now lets look at your CTE query and the execution plan:

CTE Query Section

SELECT P.Program_Number,
        P.Date_Status,
        '{"date":"' + LEFT(P.Date_Status_Display,4) + '","percent":"' + LEFT(P.Percent_Complete,6) + 
        '","status":":' + LEFT(P.Status_Overall_Col,4) + '"}' 
          AS JSON_String,          
        ROW_NUMBER() OVER (PARTITION BY P.Program_Number ORDER BY P.Date_Status DESC) AS RN
FROM #Main_Entries_Table AS p

Execution plan:

Depending on your data, that can be a big ol' expensive sort. This index will fix that:

CREATE CLUSTERED INDEX idx_123 ON #Main_Entries_Table(Program_Number ASC, Date_Status DESC);

This ^^^ is what Itzik Ben-Gan calls a POC Index which stands for Partition, Order, Cover. This index handles the PARTITION BY clause first, then the 'ORDER BY' and, because it's clustered it covers all required columns. You would likely have to create a non-clustered index with the correct covering columns.

New Execution plan:

Now for your Data_2 column (excluding Data_1):

WITH
CTE AS
(
  SELECT P.Program_Number,
          P.Date_Status,
          '{"date":"' + LEFT(P.Date_Status_Display,4) + '","percent":"' + LEFT(P.Percent_Complete,6) + 
          '","status":":' + LEFT(P.Status_Overall_Col,4) + '"}' 
            AS JSON_String,          
        ROW_NUMBER() OVER (PARTITION BY P.Program_Number ORDER BY P.Date_Status DESC) AS RN
  FROM #Main_Entries_Table AS p
)
SELECT
P.[Program_Number],
       P.[Program_Name],
       '[' + STUFF((SELECT ',' + [JSON_String]
                    FROM CTE C
                    WHERE C.Program_Number = P.Program_Number
                    ORDER BY Date_Status ASC
                   FOR XML PATH('')),1,1,'') + ']' AS Data_2
FROM #Main_Entries_Table AS P;

Execution plan:

Both queries (inside and outside the CTE) leverage the index to eliminate the sort AND to perform a seek against your rows (vs a scan which is slower). Now for your Data_1 column.

WITH
CTE AS
(
  SELECT P.Program_Number,
          P.Date_Status,
          '{"date":"' + LEFT(P.Date_Status_Display,4) + '","percent":"' + LEFT(P.Percent_Complete,6) + 
          '","status":":' + LEFT(P.Status_Overall_Col,4) + '"}' 
            AS JSON_String,          
        ROW_NUMBER() OVER (PARTITION BY P.Program_Number ORDER BY P.Date_Status DESC) AS RN
  FROM #Main_Entries_Table AS p
)
SELECT
P.[Program_Number],
       P.[Program_Name],
       '[' + STUFF((SELECT ',' + [JSON_String]
                    FROM CTE C
                    WHERE C.Program_Number = P.Program_Number
                      AND RN <= 2
                    --ORDER BY RN DESC
                   FOR XML PATH('')),1,1,'') + ']' AS Data_1
FROM #Main_Entries_Table AS p;

Here you will get will get a sort and scan if you include the ORDER BY clause. That said, you don't need it. With the aforementioned index in place, this will be quite fast:

WITH
CTE AS
(
  SELECT P.Program_Number,
    P.Date_Status,
    '{"date":"' + LEFT(P.Date_Status_Display,4) + '","percent":"' + LEFT(P.Percent_Complete,6) + 
    '","status":":' + LEFT(P.Status_Overall_Col,4) + '"}' AS JSON_String,          
    ROW_NUMBER() OVER (PARTITION BY P.Program_Number ORDER BY P.Date_Status DESC) AS RN
  FROM #Main_Entries_Table AS p
)
SELECT
P.[Program_Number],
       P.[Program_Name],
       '[' + STUFF((SELECT ',' + [JSON_String]
                    FROM CTE C
                    WHERE C.Program_Number = P.Program_Number
                      AND RN <= 2
                   -- ORDER BY RN DESC
                   FOR XML PATH('')),1,1,'') + ']' AS Data_1,
       '[' + STUFF((SELECT ',' + [JSON_String]
                    FROM CTE C
                    WHERE C.Program_Number = P.Program_Number
                    ORDER BY Date_Status ASC
                   FOR XML PATH('')),1,1,'') + ']' AS Data_2
FROM #Main_Entries_Table AS P;

Check out the final plan:

The key here is understanding how to analyze the execution plan data to tune your SQL.

TechQA.

SQL Query slow only when Row_Number values are used in STUFF

There are 1 answers

Related Questions in SQL

Related Questions in T-SQL

Related Questions in STUFF

Popular Questions

Trending Questions