I'm having the following table sourced from a SCD2 table. From this source table, I have selected only a few columns, which reults in several lines looking exactly similar. I want to remove the unnecessary lines, those that contain the same data, and have the ValidFrom column showing the first value and ValidTo column showing the last value within "the timespan group".
Source data:
| Item | Color | ValidFrom | ValidTo |
| -------- | ---------- | ------------- | ---------- |
| Ball | Red | 2020-01-01 | 2020-03-24 |
| Ball | Blue | 2020-03-25 | 2020-04-12 |
| Ball | Blue | 2020-04-13 | 2020-05-07 |
| Ball | Blue | 2020-05-08 | 2020-11-14 |
| Ball | Red | 2020-11-15 | 9999-12-31 |
| Doll | Yellow | 2020-01-01 | 2020-03-24 |
| Doll | Green | 2020-03-25 | 2020-04-12 |
| Doll | Green | 2020-04-13 | 2020-05-07 |
| Doll | Green | 2020-05-08 | 2020-11-14 |
| Doll | Pink | 2020-11-15 | 9999-12-31 |
What I want to accomplish is this:
| Item | Color | ValidFrom | ValidTo |
| -------- | ---------- | ------------- | ---------- |
| Ball | Red | 2020-01-01 | 2020-03-24 |
| Ball | Blue | 2020-03-25 | 2020-11-14 |
| Ball | Red | 2020-11-15 | 9999-12-31 |
| Doll | Yellow | 2020-01-01 | 2020-03-24 |
| Doll | Green | 2020-03-25 | 2020-11-14 |
| Doll | Pink | 2020-11-15 | 9999-12-31 |
Note that the Item Ball at first has the color Red, then Blue and then goes back to Red. That makes things a bit more complicated, from what I have learned.
Thanks for your help.
Your data is very regular. You seem to just want to combine adjacent, tiled, records that have no overlaps or gaps. However the following handles gaps and more general overlaps:
You are looking for islands of rows in the original data where the "islands" have the same item, color, and adjacent dates. This determines where islands start by looking at the previous row for the same item and color. If there is no such row or the row ends before the current row begins, then the current row is the beginning of an island.
The
grp
is then the cumulative sum of "island beginnings", and that can be used for aggregating and getting the final results.Your specific data is quite constrained -- perfectly tiled with one row ending the day before the next begins. You can do something very similar using
left join
:Here is a db<>fiddle illustrating both methods