Spliting of overlapping date intervals

131 views Asked by At

This is sample of data I have:

id,date_from,date_to,contract_type
308,01.01.2023,28.09.2023,1
308,04.03.2023,15.07.2023,2
308,01.10.2023,31.07.2024,1
477,02.04.2023,30.08.2023,1
477,01.06.2023,31.12.2023,2

And this is what data I would like to get:

id,date_from,date_to,contract_type
308,01.01.2023,03.03.2023,1
308,04.03.2023,15.07.2023,2
308,16.07.2023,28.09.2023,1
308,01.10.2023,31.07.2024,1
477,02.04.2023,31.05.2023,1
477,01.06.2023,31.12.2023,2

Basically I have two T-SQL tables, a pre-stage and a stage table. In the first table I get overlapping date periods of employee contracts. I want to split all the overlapping date periods into non-overlapping intervals and for each overlapping interval choose the higher number in contract_type. This is important.

Can anyone help me? Do I calculate each date and then group them by id and contract_type? Or is there a method to detect and split date ranges with a condition?

I should do this all in procedure that loads the stage table. I can do this, but I thought I should mention it.

Thanks to anyone in advance!

1

There are 1 answers

0
T N On

I don't know if there is a simpler way to do this that explicitly recognizes and handles specific overlap cases, but the following is a generalized divide, conquer, and merge approach.

For each id:

  1. Extract all distinct start and end dates for the current records.
  2. De-dup and sort the extracted dates.
  3. Generate date range rows between each pair of successive dates.
  4. Compare all existing contract rows against these date ranges and generate new contract rows for each overlap. (At this point, any overlaps in the original contracts are represented as exact date-range matches.)
  5. Merge rows having the same date-range, and select the max contract_type for each.
  6. Apply a gaps and islands algorithm for merging adjacent date ranges having the same contract type.

When doing date-range calculations, it is often easier to work with date ranges having an exclusive date (00:00 the next day) instead of inclusive end dates (implicitly 23:59:59.999* the last day). This avoids the constant off-by-one adjustments when comparing start vs end dates.

A standard date-range overlap comparison would then be start1 < end2 AND start2 < end1.

Code:

;WITH CTE_AdjustedContracts AS (
    SELECT
        id,
        date_from,
        DATEADD(day, 1, date_to) AS date_to_excl, -- Adjust to excludive end-date
        contract_type
    FROM Contracts
),
CTE_Dates AS (
    SELECT id, date_from AS Date FROM CTE_AdjustedContracts
    UNION -- implicitly DISTINCT
    SELECT id, date_to_excl AS Date FROM CTE_AdjustedContracts
),
CTE_DateRanges AS (
    SELECT
        id,
        Date AS date_from,
        LEAD(Date) OVER(PARTITION BY id ORDER BY date) AS date_to_excl
    FROM CTE_Dates
),
CTE_SelectedContracts AS (
    SELECT
        AC.id,
        DR.date_from,
        DR.date_to_excl,
        MAX(AC.contract_type) AS contract_type
    FROM CTE_AdjustedContracts AC
    JOIN CTE_DateRanges DR
        ON DR.id = AC.id
        AND DR.date_from < AC.date_to_excl
        AND AC.date_from < DR.date_to_excl
    GROUP BY AC.id, DR.date_from, DR.date_to_excl
),
CTE_MergedContracts AS (
    SELECT
        id,
        contract_type,
        MIN(date_from) AS date_from,
        MAX(date_to_excl) AS date_to_excl
    FROM (
        SELECT
            *,
            SUM(is_gap) OVER(PARTITION BY id, contract_type order by date_from) AS grp
        FROM (
            SELECT
                *,
                CASE WHEN date_from <> LAG(date_to_excl)
                    OVER(PARTITION BY id, contract_type order by date_from)
                THEN 1 ELSE 0 END AS is_gap
            FROM CTE_SelectedContracts
        ) G
    ) G
    GROUP BY id, contract_type, grp
)
SELECT
    id,
    date_from,
    DATEADD(day, -1, date_to_excl) AS date_to, -- Adjust back to original inclusive end-date
    contract_type
FROM CTE_MergedContracts MC
ORDER BY id, date_from

Results:

id date_from date_to contract_type
308 2023-01-01 2023-03-03 1
308 2023-03-04 2023-07-15 2
308 2023-07-16 2023-09-28 1
308 2023-10-01 2024-07-31 1
477 2023-04-02 2023-05-31 1
477 2023-06-01 2023-12-31 2

See this db<>fiddle for a demo.

NOTE: One side effect of this processing is that any adjacent or overlapping contracts of the same type in the original data may be merged into a single row in the final result.