Merge operations on historic table

108 views Asked by At

I would like to know if there is an existing, well-established algorithm to perform merge operations on a temporal table.

As an minimal example, say we have a table [pricedata] as such:

+------+------+--------+
| from | to   | price  |
+------+------+--------+
| 2008 | 2009 | 100    |
| 2009 | 2011 | 121    |
| 2011 | 2013 | 142    |
+------+------+--------+

(the [to] column here is supposed to be noninclusive of itself). Now, we gets some new data to overwrite the original:

+------+------+--------+
| from | to   | price  |
+------+------+--------+
| 2010 | 2012 | 109    |
+------+------+--------+

The expected result should split and rearrange prices accordingly:

+------+------+--------+
| from | to   | price  |
+------+------+--------+
| 2008 | 2009 | 100    |
| 2009 | 2010 | 121    |
| 2010 | 2012 | 109    |
| 2012 | 2013 | 142    |
+------+------+--------+

I purposefully omit too many details, because the question is if there is already an important resource/algorithm that already does that. It feel common-need enough to be around, though my websearch has not produced anything.

1

There are 1 answers

4
T N On

Note This answer has been significantly revised from the original post to properly handle overlapping cases requiring a split.

I am not sure id there is a well-established (well known) algorithm, but I believe the following may include the logic you are looking for.

For data with inclusive start dates and exclusive end dates, a test for overlapping dates would be (A.FromDate < B.ToDate AND B.FromDate < A.ToDate).

Before inserting the new row, you can query for any existing rows that overlap the date range being inserted and adjust the existing start and end dates. These will need to be adjusted, split, or possibly even deleted to eliminate overlaps.

  1. If the new range overlaps the lower end of an existing row, the from value of the existing row should be adjusted.
  2. If the new range overlaps the upper end of an existing row, the to value of the existing row should be adjusted.
  3. If the new range is contained entirely within the range of an existing row, that existing row will need to be duplicated and adjusted to cover the residual lower and upper ranges not superseded by the new insert.
  4. If any existing rows are contained entirely within the new range being inserted, those rows may be deleted.

I have assumed that new data always supersedes existing data with an overlapping range, and that rows that have been entirely superseded may be deleted.

The following logic will handle all of the above cases:

-- Delete any row completely superseded by new insert
DELETE pricedata
WHERE [from] >= @InsertFrom
AND [to] <= @InsertTo 

-- Duplicate and adjust any row split by new insert, leaving lower residual
INSERT pricedata ([from], [to], price)
SELECT [from], @InsertFrom, price
FROM pricedata
WHERE [from] < @InsertFrom
AND [to] > @InsertTo 

-- Adjust original row split by new insert, leaving upper residual
UPDATE pricedata
SET [from] = @InsertTo
WHERE [from] < @InsertFrom
AND [to] > @InsertTo 

-- Trim lower end of row overlapping upper end of insert
UPDATE pricedata
SET [from] = @InsertTo
WHERE [from] >= @InsertFrom
AND [from] < @InsertTo 

-- Trim upper end of row overlapping lower end of insert
UPDATE pricedata
SET [to] = @InsertFrom
WHERE [to] > @InsertFrom
AND [to] <= @InsertTo 

-- Now that we have resolved any overlaps, we are clear to insert the new row
INSERT pricedata ([from], [to], price)
VALUES (@InsertFrom, @InsertTo, @InsertPrice)

See this db<>fiddle that wraps the above logic up in a stored procedure and demonstrates its operation with the OP's scenario and several other cases.