Bottle neck on SORT operation

138 views Asked by At

I have the following query and it's proving very costly and is taking 6-8 seconds to execute. Looking at the execution plan, the cost is 79% on a SORT operation. Can I get any improvement here?

IMG

    SELECT
        A.StageName,
        C.Month,
        C.MonthName as Label,
        C.Year,
        isnull(A.Average,0) as Data
    FROM
    (   
        SELECT 
            S.StageName,
            MONTH(TimeIn) as MonthNumber, 
            DATENAME(MONTH,TimeIn) as Month, 
            YEAR(TimeIn) as Year, 
            ISNULL(AVG(DATEDIFF(mi,TimeIn,isnull(TimeOut,@TodayDate))),0) as Average
        FROM
            VisitMovement VM
        INNER JOIN Stage S on
            VM.StageID = S.StageID
        WHERE 
            (VM.TimeIn >= @StartDate AND 
            VM.TimeIn  < DATEADD (d,1,@EndDate)) AND
            (VM.TimeOut < DATEADD (d,1,@EndDate) OR VM.TimeOut IS NULL)
        GROUP BY
            S.StageNumber,
            S.StageName,
            MONTH(TimeIn), 
            DATENAME(MONTH,TimeIn), 
            YEAR(TimeIn)
    ) A
    RIGHT JOIN (select distinct Month,MonthName,Year from Calendar WHERE DATE >= @StartDate AND DATE < DATEADD (d,1,@EndDate)) C on
        A.MonthNumber = C.Month and
        A.Month = C.MonthName and
        A.Year = C.Year 
    GROUP BY
        A.StageName,
        C.Month,
        C.MonthName,
        C.Year,
        A.Average   
    ORDER BY  
        CASE WHEN @Ordering = 'asc'  THEN C.Year            END ASC, 
        CASE WHEN @Ordering = 'asc'  THEN C.Month           END ASC,
        CASE WHEN @Ordering = 'asc'  THEN A.StageName       END ASC,
        CASE WHEN @Ordering = 'desc' THEN C.Year            END DESC, 
        CASE WHEN @Ordering = 'desc' THEN C.Month           END DESC,
        CASE WHEN @Ordering = 'desc'  THEN A.StageName      END DESC
2

There are 2 answers

1
deroby On BEST ANSWER

Although I understand you can't get rid of the GROUP BY on the different columns in the sub-query, you can make it easier for the system.

Currently you have

S.StageNumber,
S.StageName,
MONTH(TimeIn), 
DATENAME(MONTH,TimeIn), 
YEAR(TimeIn)

Which I guess is quite a lot of data to go through. Allow me to make some guesses:

S.StageNumber,  -- int, 4 bytes
S.StageName, -- string, 20 bytes
MONTH(TimeIn),  -- int, 4 bytes
DATENAME(MONTH,TimeIn),  -- string 5 bytes
YEAR(TimeIn) -- int, 4 byte

Now, there are some dependencies:

  • If you know the MONTH (number), then you also now the name of it
  • I'm guessing that StageName + StageNumber is unique and directly related to StageID. If not you may have to GROUP BY once more in the outer layer.

That would bring us to

S.StageID,  -- int, 4 bytes
MONTH(TimeIn),  -- int, 4 bytes
YEAR(TimeIn) -- int, 4 byte

This means that the sorting for the GROUP BY has to run through only 12 bytes per record instead of the 37 bytes per record it was before, and numbers sort quite a bit faster than strings anyway (e.g. due to upper/lower case, accents, etc..)

I've tried to rewrite the query accordingly (Untested!). I've also moved the fetch of Month-information to a separate temp-table, this should help the Query Optimizer a little bit.

SELECT DISTINCT Month,MonthName,Year 
  INTO #dates
  FROM Calendar 
 WHERE DATE >= @StartDate AND DATE < DATEADD (d,1,@EndDate)

CREATE UNIQUE CLUSTERED INDEX uq0_#dates ON #dates (Month,Year)

SELECT
        A.StageName,
        C.Month,
        C.MonthName as Label,
        C.Year,
        isnull(A.Average,0) as Data
    FROM 
    (   
        SELECT S.StageName,
               MonthNumber,
               Year,
               Average
          FROM (    
                  SELECT VM.StageID,
                         MONTH(TimeIn) as MonthNumber, 
                         YEAR(TimeIn) as Year, 
                         ISNULL(AVG(DATEDIFF(mi,TimeIn,isnull(TimeOut,@TodayDate))),0) as Average
                    FROM
                         VisitMovement VM
                   WHERE 
                        (VM.TimeIn >= @StartDate AND 
                        VM.TimeIn  < DATEADD (d,1,@EndDate)) AND
                        (VM.TimeOut < DATEADD (d,1,@EndDate) OR VM.TimeOut IS NULL)
                  GROUP BY
                        VM.StageID,
                        MONTH(TimeIn), 
                        YEAR(TimeIn)
               ) grp
          JOIN Stage S 
            ON S.StageID = grp.StageID
        )
    ) A
    RIGHT OUTER JOIN #dates C
    on
        A.MonthNumber = C.Month and
        -- A.Month = C.MonthName and
        A.Year = C.Year 
    ORDER BY  
        CASE WHEN @Ordering = 'asc'  THEN C.Year            END ASC, 
        CASE WHEN @Ordering = 'asc'  THEN C.Month           END ASC,
        CASE WHEN @Ordering = 'asc'  THEN A.StageName       END ASC,
        CASE WHEN @Ordering = 'desc' THEN C.Year            END DESC, 
        CASE WHEN @Ordering = 'desc' THEN C.Month           END DESC,
        CASE WHEN @Ordering = 'desc'  THEN A.StageName      END DESC

Hope this helps.

1
Me.Name On

Because the order by has to evaluate each row, I don't think it can make optimal use of indices. Replacing the order by with a row_number() for the default sorting and only reversing the order once should at the least prevent multiple evaluations of @Ordering for a single row.

In the pseudo code below, the original query is put in a CTE. Row_number determines the ascending sort and below the cte the order is reversed if needed:

;with cte as
(
    SELECT
        A.StageName,
        C.Month,
        ...,
        row_number() over (order by C.Year,C.Month,A.StageName) sortOrder
    FROM
       ...rest of the query, excluding the order by
)
select * --or list the columns without the sortOrder 
from cte
order by sortOrder * case @Ordering when 'desc' then -1 else 1 end