Implement SCD Type 2 on periodic snapshot table

155 views Asked by At

Currently I have a very big table that have a snapshot of data for each month.

ID other Team Period
1 ..... A 2020-04-30
1 ..... A 2020-05-31
1 ..... A 2020-06-30
1 ..... A 2020-07-31
1 ..... B 2020-08-31
1 ..... B 2020-09-30
1 ..... C 2020-10-31
1 ..... C 2020-11-30
1 ..... C 2020-12-31

I would like to implement SCD on this table based on comparison of all listed columns. How Can I achieve it?

In this example I would like to have only 3 rows where Team column was changed and proper valid_from, valid_to columns based on period column.

2

There are 2 answers

1
marcothesane On BEST ANSWER

If you can get those who built the monthly snapshot to do otherwise, encourage them strongly to do so.

For the rest: in one query, get the previous value for team, as prev_team - I do that in a WITH clause, and name the Common table Expression w_prev.

Then I run an outer query on w_prev, filtering by the previous team being NULL or different from the current team value - and use the three-parameter variant of the LEAD() OLAP function to get the next date, or, if missing, a maximum possible date:

WITH
-- your input, don't use in final query ..
indata(id,other,team,period) AS (
          SELECT 1,'.....','A',CAST('2020-04-30' AS DATE)
UNION ALL SELECT 1,'.....','A',CAST('2020-05-31' AS DATE)
UNION ALL SELECT 1,'.....','A',CAST('2020-06-30' AS DATE)
UNION ALL SELECT 1,'.....','A',CAST('2020-07-31' AS DATE)
UNION ALL SELECT 1,'.....','B',CAST('2020-08-31' AS DATE)
UNION ALL SELECT 1,'.....','B',CAST('2020-09-30' AS DATE)
UNION ALL SELECT 1,'.....','C',CAST('2020-10-31' AS DATE)
UNION ALL SELECT 1,'.....','C',CAST('2020-11-30' AS DATE)
UNION ALL SELECT 1,'.....','C',CAST('2020-12-31' AS DATE)
)
-- end of input, replace following comma with "WITH" ..                                                                                                                                                       
,
w_prev AS (
  SELECT
    *
  , LAG(team) OVER(PARTITION BY id ORDER BY period) AS prev_team
  FROM indata
)
SELECT
  id
, other
, team
, period
, LEAD(period,1,CAST('9999-12-01' AS DATE)) OVER(PARTITION BY id ORDER BY period) AS to_period
FROM w_prev
WHERE prev_team IS NULL
   OR prev_team <> team
;
id other team period to_period
1 ..... A 2020-04-30 2020-08-31
1 ..... B 2020-08-31 2020-10-31
1 ..... C 2020-10-31 9999-12-01
0
David Browne - Microsoft On

You shouldn't do this.

You would have to store all the versioned dimension attributes on this table, and have a new version every time any one of them changed.

Instead this table should have foreign keys to the various dimension tables, each of which can use the SCD pattern if you want.

Also you shouldn't omit rows from a periodic snapshot fact, as it makes running calculations more complicated, and breaks the relationship between the fact table and the date dimension.