SQL Server SCD II implementation quandary

109 views Asked by At

I have a patient table with north of 80 fields, of which I'll need to keep historical data on all fields. For clarification, not all fields "need" versioning, though the decision was made to simply keep a full copy of the old records.

For clarification, this is for an ETL process, which will be taking place off-hours.

There seem to be a few approaches here, though I don't know which one will work best for me, as I haven't done this before.

Option 1: use SSIS for an SCD implementation.
Pro: Since we're using SSIS anyhow to source our data, it would make sense to simply continue using the same tool
Con: I've read some articles on the performance of SCD in SSIS - example: https://chrisjarrintaylor.co.uk/2012/07/03/ssis-scd-vs-merge-statement-performance-comparison/

Option 2: Use MERGE in a tsql statement.
Pro: short to write, simple to understand - does all steps in one
Con: can be cumbersome if we try to write conditions on 80+ fields. We may also have a performance issue doing an OR on that many fields (as in, if any are different)
The following approach would work for me, though writing it that way, would mean a very long merge statement: Need help understanding alternatives to scd in SSIS

Option 3: INSERT all of the new records, use a CTE to clean up duplicates (those that have no change), an UPDATE statement to deactivate those records which changed
Pro: No messy OR statements, not too hard to write or understand - performance doesn't seem as if it'll be an issue (we've used the CTE approach in the past)
Con: Somehow, this comes across as being highly kludgy approach

Considering these approaches, or others which you may have, is there one which is easier to maintain, will scale better?

Small sampling of data - there are 80+ columns in source data (which I cannot control) and much more rows: http://rextester.com/live/AYQUT28070

0

There are 0 answers