Should recursive common table expressions over dmvs be built on cached data?

97 views Asked by At

I have written a little CTE to get the total blocking time of a head blocker process, and I am unsure if I should first copy all of the processes that I want the CTE to run over into a temp table and then perform the query over this - i.e. I want to be sure that the data cannot change under my feet whilst the query runs and (worst case scenario), I end up with an infinite recursive loop!

This is my SQL including the temp table - I'd prefer not to have to use the table for performance reasons, and go directly to the sysprocesses dmv inside my CTE, but I'm not sure of the possible implications of this.

DECLARE @proc TABLE(
    spid SMALLINT PRIMARY KEY, 
    blocked SMALLINT INDEX blocked_index, 
    waittime BIGINT)

INSERT INTO @proc 
SELECT spid, blocked, waittime 
FROM master..sysprocesses 

;WITH block_cte AS 
(
    SELECT spid, CAST(blocked AS BIGINT) [wait_time], spid [root_spid] 
    FROM @proc
    WHERE blocked = 0
    UNION ALL
    SELECT blocked.spid, blocked.waittime, block_cte.spid 
    FROM @proc AS blocked
    INNER JOIN block_cte ON blocked.blocked = block_cte.spid
)
SELECT root_spid blocking_spid, SUM(wait_time) total_blocking_time
FROM block_cte 
GROUP BY root_spid
1

There are 1 answers

1
David Rushton On

This question is probably best transfered to Stack DBA. I'm sure those clever guys and girls can not only tell you the answer but also the reason behind it.

Not being sure myself I decided to test it...

My script captures the record count fromsysProcesses 1,000 times. Now to do this I had to circumnavigate several limits placed on CTEs. Among other restrictions; you cannot use aggregate functions. This makes counting records quite hard. So I created an inline table function to return the current row count from sysProcesses.

sysProcess Count Function

CREATE FUNCTION ProcessCount()
RETURNS TABLE 
AS
RETURN 
(
    -- Return the current process count.
    SELECT 
        COUNT(*) AS RecordCount
    FROM
        Master..sysProcesses
)
;

I wrapped this function in a CTE.

CTE

WITH RCTE AS 
(
    /* CTE to test if recursion is effected by updates to 
     * underlying data.
     */
        -- Anchor part.
        SELECT 
            1 AS ExecutionCount,
            1 AS JoinField,
            RecordCount
        FROM
            ProcessCount()          

    UNION ALL

        -- Recursive part.
        SELECT 
            r.ExecutionCount + 1    AS ExecutionCount,
            1 AS JoinField,
            pc.RecordCount
        FROM
            ProcessCount() AS pc
                INNER JOIN RCTE AS r        ON r.JoinField = 1
        WHERE
            r.ExecutionCount < 1000

)
SELECT 
    MIN(RecordCount)    AS MinRecordCount,
    MAX(RecordCount)    AS MaxRecordCount
FROM 
    RCTE
OPTION 
    (MAXRECURSION 1000)
;
GO

If the min and max record counts are always equal this would suggest there is only one consistent view of sysProcesses, used throughout the query. Any difference proves this is not the case. Running on SQL Server 2008 R2 I did find differences:

Results

Run    Min   Max
1      113   254
2      107   108
3      86    108

Of course the inline function could be to blame here. It certainly changed my execution plan. This has taught me a lesson. I really need to better understand execution plans. I'm sure reading the OPs plan would provide a definitive answer.