Show Date Range in Custom Column - Gaps and Islands

362 views Asked by At

I have table that looks like this:

+------------+------+
|    Date    | Name |
+------------+------+
| 2017-01-07 | A    |
| 2017-01-08 | A    |
| 2017-01-09 | A    |
| 2017-01-12 | A    |
| 2017-01-07 | B    |
| 2017-01-08 | B    |
| 2017-01-09 | B    |
+------------+------+

I would like to be able to turn it into the following:

+-------------------------+------+
|       Date Range        | Name |
+-------------------------+------+
| 2017-01-07 - 2017-01-09 | A    |
| 2017-01-07 - 2017-01-09 | B    |
| 2017-01-12              | A    |
+-------------------------+------+

The code would find the minimum and maximum of consecutive dates only, group the results using the Name column and then list the minimum and maximum dates as a 'to and from' string in one column.

I'm having problems trying to list consecutive dates only. Note that the third entry above gets its own entry because it is not consecutive with the date range for 'A' in the earlier entry.

EDIT: Please note: This is specific to SQL Server 2008, which does not allow use of the LAG function.


EDIT 2: The original answer supplied by McNets worked fine on SQL Server 2012. I've included it here as it's better if you have SQL Server 2012 onwards.

;WITH CalcDiffDays AS
(
    SELECT Date, Name,
    CONCAT (Name, CAST(DATEDIFF(DAY, LAG(Date, 1, Date - 1) OVER (PARTITION BY Name ORDER BY Name, Date), Date) AS VARCHAR(10))) AS NumDays
    FROM @tmpTable
)
SELECT CONCAT(CONVERT(VARCHAR(20), MIN(Date), 102), ' - ', CONVERT(VARCHAR(20), MAX(Date), 102)) AS [Data Range], Name
FROM CalcDiffDays
GROUP BY NumDays, Name;
1

There are 1 answers

4
McNets On BEST ANSWER

First I've added a row number to the whole table.

WITH RowN AS
(
    SELECT Date, Name, ROW_NUMBER() OVER (ORDER BY Name, Date) RN
    FROM #T
)

Then I've joined this table with itself just to calculate days between dates.

,CalcDiffDays AS
(
    SELECT RowN.Date, RowN.Name,
        ISLAND = RowN.Name + 
        CASE  
           WHEN RowN.RN > 1 AND RowN.Name = R2.Name THEN CAST(DATEDIFF(day, R2.Date, RowN.Date) AS VARCHAR(20))
           ELSE '1'
        END
    FROM RowN
         LEFT JOIN RowN R2 ON R2.RN = RowN.RN-1
)

GAPS. How many days between consecutive dates of the same name.

ISLANDS. By adding the name to the calculated days.

+---------------------+------+---------+
|         Date        | Name | NumDays |
+---------------------+------+---------+
| 07.01.2017 00:00:00 |   A  |    A1   |
+---------------------+------+---------+
| 08.01.2017 00:00:00 |   A  |    A1   |
+---------------------+------+---------+
| 09.01.2017 00:00:00 |   A  |    A1   |
+---------------------+------+---------+
| 12.01.2017 00:00:00 |   A  |    A3   |
+---------------------+------+---------+
| 07.01.2017 00:00:00 |   B  |    B1   |
+---------------------+------+---------+
| 08.01.2017 00:00:00 |   B  |    B1   |
+---------------------+------+---------+
| 09.01.2017 00:00:00 |   B  |    B1   |
+---------------------+------+---------+

The second part: get the MIN and MAX Date of each island.

WITH RowN AS
(
    SELECT Date, Name, ROW_NUMBER() OVER (ORDER BY Name, Date) RN
    FROM #T
)
,CalcDiffDays AS
(
    SELECT RowN.Date, RowN.Name,
        ISLAND = RowN.Name + 
        CASE  
           WHEN RowN.RN > 1 AND RowN.Name = R2.Name THEN CAST(DATEDIFF(day, R2.Date, RowN.Date) AS VARCHAR(20))
           ELSE '1'
        END
    FROM RowN
         LEFT JOIN RowN R2 ON R2.RN = RowN.RN-1
)    
SELECT CONVERT(VARCHAR(20), MIN(Date), 102) + ' - ' + CONVERT(VARCHAR(20), MAX(Date), 102) AS [Data Range], Name
FROM CalcDiffDays
GROUP BY ISLAND, Name
ORDER BY MIN(Date);

+-------------------------+------+
|        Data Range       | Name |
+-------------------------+------+
| 2017.01.07 - 2017.01.09 |   A  |
+-------------------------+------+
| 2017.01.07 - 2017.01.09 |   B  |
+-------------------------+------+
| 2017.01.12 - 2017.01.12 |   A  |
+-------------------------+------+

Can check it here: http://rextester.com/MHLEEJ50479