SQL Server get customer with 7 consecutive transactions

1.5k views Asked by At

I am trying to write a query that would get the customers with 7 consecutive transactions given a list of CustomerKeys.

I am currently doing a self join on Customer fact table that has 700 Million records in SQL Server 2008.

This is is what I came up with but its taking a long time to run. I have an clustered index as (CustomerKey, TranDateKey)

SELECT 
    ct1.CustomerKey,ct1.TranDateKey
FROM
    CustomerTransactionFact ct1
INNER JOIN 
    #CRTCustomerList dl ON ct1.CustomerKey = dl.CustomerKey --temp table with customer list
INNER JOIN 
    dbo.CustomerTransactionFact ct2 ON ct1.CustomerKey = ct2.CustomerKey -- Same Customer
                                    AND ct2.TranDateKey >= ct1.TranDateKey 
                                    AND ct2.TranDateKey <= CONVERT(VARCHAR(8), (dateadd(d, 6, ct1.TranDateTime), 112) -- Consecutive Transactions in the last 7 days
WHERE  
    ct1.LogID >= 82800000
    AND ct2.LogID >= 82800000
    AND ct1.TranDateKey between dl.BeginTranDateKey and dl.EndTranDateKey
    AND ct2.TranDateKey between dl.BeginTranDateKey and dl.EndTranDateKey
GROUP BY   
    ct1.CustomerKey,ct1.TranDateKey
HAVING 
    COUNT(*) = 7

Please help make it more efficient. Is there a better way to write this query in 2008?

2

There are 2 answers

0
Gordon Linoff On

You can do this using window functions, which should be much faster. Assuming that TranDateKey is a number and you can subtract a sequential number from it, then the difference constant for consecutive days.

You can put this in a query like this:

SELECT CustomerKey, MIN(TranDateKey), MAX(TranDateKey)
FROM (SELECT ct.CustomerKey, ct.TranDateKey,
             (ct.TranDateKey -
              DENSE_RANK() OVER (PARTITION BY ct.CustomerKey, ct.TranDateKey)
             ) as grp
      FROM CustomerTransactionFact ct INNER JOIN
           #CRTCustomerList dl
           ON ct.CustomerKey = dl.CustomerKey 
     ) t
GROUP BY CustomerKey, grp
HAVING COUNT(*) = 7;

If your date key is something else, there is probably a way to modify the query to handle that, but you might have to join to the dimension table.

0
dnoeth On

This would be a perfect task for a COUNT(*) OVER (RANGE ...), but SQL Server 2008 supports only a limited syntax for Windowed Aggregate Functions.

SELECT CustomerKey, MIN(TranDateKey), COUNT(*)
FROM 
 (
   SELECT CustomerKey, TranDateKey,
      dateadd(d,-ROW_NUMBER() 
                 OVER (PARTITION BY CustomerKey
                       ORDER BY TranDateKey),TranDateTime) AS dummyDate
   FROM CustomerTransactionFact 
 ) AS dt
GROUP BY CustomerKey, dummyDate
HAVING  COUNT(*) >= 7

The dateadd calculates the difference between the current TranDateTime and a Row_Number over all date per customer. The resulting dummyDatehas no actual meaning, but is the same meaningless date for consecutive dates.