How do I calculate length of streak?

256 views Asked by At

I have the following data:

date          unit     status
2023-04-30    unit1    1
2023-05-31    unit1    1
2023-08-31    unit1    1
2023-09-30    unit1    1
2023-11-30    unit1    1
2023-12-31    unit1    1
2024-01-31    unit1    1
2024-02-28    unit1    1

For a reference date I would like to know the length of the first upcoming "streak" (on MSSQL, used for production, and sqlite, used for unit tests)

Example 1:

For date 2023-05-15 my desired output is:

unit     streak
unit1    3

The reason for this is that the first month with status=1 after 2023-05 is 2023-08, and then I just count for each consecutive month.

Example 2:

For date 2023-11-01 my desired output is:

unit     streak
unit1    3

The reason is that the first month with status=1 after 2023-11 is 2023-12, and the streak ends on 2024-02 as months with status=0 are not recorded, and the next month with status=1 is more that a month away.

2

There are 2 answers

5
SelVazi On BEST ANSWER

This is a gaps and islands problem that can be resolved by (value minus row_number), since that is invariant within a consecutive sequence. The start and end dates are just the MIN() and MAX() of the group :

WITH cte as (
  SELECT  *, GroupingSet = FORMAT(DATEADD(
                             MONTH, - ROW_NUMBER() OVER(PARTITION BY unit ORDER BY [date]), 
                             [date]
                           ), 'yyyy-MM-01')
  FROM    mytable
  WHERE [date] > EOMONTH('2023-05-15') AND [status] = 1
)
SELECT  TOP 1 unit,
        StartDate = MIN([date]),
        EndDate = MAX([date]),
        streak = COUNT(*)
FROM    CTE
GROUP BY unit, GroupingSet
ORDER BY StartDate;

NB : The giving date have been converted to the first day of the month so the GroupingSet can be matched withing same year/month !

Demo here

———

Using the row_number method, we can easily obtain the top streak for each unit when considering several units:

with cte AS (
  SELECT  *, GroupingSet = FORMAT(DATEADD(
                             MONTH, - ROW_NUMBER() OVER(PARTITION BY unit ORDER BY [date]), 
                             [date]
                           ), 'yyyy-MM-01')
  FROM    mytable
  WHERE [date] > EOMONTH('2023-05-15') AND [status] = 1
),
cte2 AS (
  SELECT  unit,
        StartDate = MIN([date]),
        EndDate = MAX([date]),
        streak = COUNT(*)
  FROM    CTE
  GROUP BY unit, GroupingSet
),
cte3 as (
  SELECT *, row_number() over (partition by unit order by streak desc) as rn
  FROM cte2
)
SELECT unit, StartDate, EndDate, streak
FROM cte3
WHERE rn = 1

Demo here

6
suchislife On

For SQL Server and SQLite, calculating the streak requires a combination of common table expressions (CTEs), window functions, and joins. Let's tackle this problem:

SQL Server Solution:

WITH RankedData AS (
    SELECT [date], [unit], [status],
           ROW_NUMBER() OVER (PARTITION BY [unit] ORDER BY [date]) - 
           MONTH([date]) AS GroupingID
    FROM YourTableName
    WHERE [date] > '2023-05-15' AND [status] = 1
)
SELECT TOP 1 [unit], COUNT(*) AS streak
FROM RankedData
GROUP BY [unit], GroupingID
ORDER BY MIN([date]);

SQLite Solution: SQLite lacks some of the advanced windowing capabilities of SQL Server, but you can achieve a similar effect with joins and subqueries:

WITH RankedData AS (
    SELECT [date], [unit], [status],
           strftime('%m', [date]) + 0 - (ROW_NUMBER() OVER (PARTITION BY [unit] ORDER BY [date])) AS GroupingID
    FROM YourTableName
    WHERE [date] > '2023-05-15' AND [status] = 1
)
SELECT [unit], COUNT(*) AS streak
FROM RankedData
GROUP BY [unit], GroupingID
ORDER BY MIN([date])
LIMIT 1;

These scripts should provide the length of the first upcoming streak for the given reference date. Just adjust the date in the WHERE clause as needed.