Change tracking/history keeping in SQL database using TSQL script in single table

Question

Change tracking/history keeping in SQL database using TSQL script in single table

683 views Asked by OkayDataScience At 08 June 2021 at 07:33

For context I have a simple Azure SQL database and I have a theoretical dimensional model with like 12 dimensions and some fact tables that eventually should function as a DWH.

I am looking to apply historization on this dimenional model. To start simple and apply the concept of historization I want to enable it on one dimension. In my case this is the DimEmployee which looks like this with a row of sample data (it has more columns but for simplicity sake I only take these)

EmpKey	EmpName	EmpCity	DWHDatStart	DWHDatEnd	IsActive
52	John	London	02/02/2020	31/12/9999	Y

Now if lets say John moves to a different city I want this to be tracked so the wanted situation when the value of EmpCity is updated would be

EmpKey	EmpName	EmpCity	DWHDatStart	DWHDatEnd	IsActive
52	John	Amsterdam	08/06/2021	31/12/9999	Y
52	John	London	02/02/2020	08/06/2021	N

How would I go about applying this with TSQL. I assume it will have to be a trigger of some sorts but lack the specific knowledge to apply this. I've read about temporal tables and that usually when enabling historization on data you store the historical data in a separate history table. I think because of this models design there can always be expanded on this concept by adding DimEmployeeHist dimensions or whatever but for the first concept this is not needed. I don't know what value it would bring doing it in separate tables other then when you have lot's of history records so you want them all neatly stored in one place to not clutter up your main tables.

Does this have to be done in the create table script or can this be done after the table is created?

Original Q&A

There are 1 answers

**Nick.Mc** · Accepted Answer · 2021-06-09T10:23:36+00:00

Firstly: forget about triggers. Triggers are bad

This is a braindump on how I generally do this. There are many permutations on this but this should give you an idea.

This is a row in the target dimension. Of course there are many other rows

SurrogateKey	SourceKey	SourceSystem	EmpName	EmpCity	DWHDatStart	DWHDatEnd	IsActive
3678	52	HRSystem1	John	London	2020-02-20	9999-12-31	Y
3642	73	HRSystem1	Jim	Brisbane	2021-03-18	9999-12-31	Y

SurrogateKey is the key that joins to your fact and is unique in the dimension table (and should be enforced with a constraint or index)
SourceKey is the key in the source system
SourceSystem is a code for whatever system provided this record.
SourceKey + SourceSystem is unique (and should be enforced with a constraint or index)

We load data into our datawarehouse. Typically the first step is to load the input data into a staging table. So say we have this in a staging table:

EmpKey	EmpName	EmpCity	ActiveRecordSurrogateKey	Ignore
52	John	Amsterdam	NULL	NULL
73	Jim	Brisbane	NULL	NULL
7	Jack	Texas	NULL	NULL

ActiveRecord and Ignore are working columns, they don't come from the source. Every other column does come from the source but knows nothing about the dimension

There will obviously be many records in the dimension and staging table.

First exclude all records that are up to date in the dimension.

UPDATE StagingTable
SET Ignore = 'Y'
FROM StagingTable TGT
INNER JOIN DimensionTable SRC
ON TGT.EmpKey = SRC.SourceKey
AND TGT.SourceKey = 'HRSystem1'
AND TGT.EmpName=SRC.EmpName 
AND TGT.EmpCity=SRC.EmpCity
AND SRC.IsActive = 'Y'

We've identified that Jim has not changed and can be ignored

EmpKey	EmpName	EmpCity	ActiveRecordSurrogateKey	Ignore
52	John	Amsterdam	NULL	NULL
73	Jim	Brisbane	NULL	Y
7	Jack	Texas	NULL	NULL

identify all the records in the staging table that already have an active record in the dimension but have different attributes

UPDATE StagingTable
SET ActiveRecordSurrogateKey = SRC.SurrogateKey
FROM StagingTable TGT
INNER JOIN DimensionTable SRC
ON TGT.EmpKey = SRC.SourceKey
AND TGT.SourceKey = 'HRSystem1'
AND TGT.IsActive='Y'
AND (TGT.EmpName<>SRC.EmpName OR TGT.EmpCity<>SRC.EmpCity)

(These two seperate updates could be combined into one if you want.)

Now our staging table looks like this.

EmpKey	EmpName	EmpCity	ActiveRecordSurrogateKey	Ignore
52	John	Amsterdam	3678	NULL
73	Jim	Brisbane	NULL	Y
7	Jack	Texas	NULL	NULL

Now we have enough info to update the dimension. We can write some more SQL to apply to the dimension based on the helper columns.

But first, Lets specify a fixed date. This stops weird things happening if this runs before and after midnight. Or you might want to determine this some other way, like an input parameter or data in the source

DECLARE @Date DATE = GETDATE();

Now we can insert all new records (brand new or changed)

-- This line inserts new dimension records:
INSERT INTO DimensionTable (SourceKey,SourceSystem,EmpName,EmpCity, StartDate,EndDate,IsActive)
SELECT EmpKey,'HRSystem1',EmpName,EmpCity, @Date,'2999-01-01','Y' 
FROM StagingTable 
WHERE Ignore IS NULL

Now our dimension looks like this

SurrogateKey	SourceKey	SourceSystem	EmpName	EmpCity	DWHDatStart	DWHDatEnd	IsActive
3678	52	HRSystem1	John	London	2020-02-20	9999-12-31	Y
3642	73	HRSystem1	Jim	Brisbane	2021-03-18	9999-12-31	Y
3693	7	HRSystem1	Jack	Texas	2021-06-09	9999-12-31	Y
3694	52	HRSystem1	John	Amsterdam	2021-06-09	9999-12-31	Y

Now we end date existing records:

-- This line end-dates existing records:
UPDATE DimensionTable
SET DWEndDate = @Date, Active = 'N'
FROM DimensionTable TGT
INNER JOIN StagingTable SRC
ON TGT.SurrogateKey  = ActiveRecordSurrogateKey

SurrogateKey	SourceKey	SourceSystem	EmpName	EmpCity	DWHDatStart	DWHDatEnd	IsActive
3678	52	HRSystem1	John	London	2020-02-20	2021-06-09	N
3642	73	HRSystem1	Jim	Brisbane	2021-03-18	9999-12-31	Y
3693	52	HRSystem1	Jack	Texas	2021-06-09	9999-12-31	Y
3694	52	HRSystem1	John	Amsterdam	2021-06-09	9999-12-31	Y

So you basically wrap all of those T-SQL statements up in one stored procedure, add some transactions, logging, and error handling.

CREATE PROC pUpdateDimPerson
AS
BEGIN

-- All the code above

END

The stored procedure compares source (staging) and target (dimension) and does all the right things.

There are a bunch of other things to consider but this gives you an idea.

TechQA.

Change tracking/history keeping in SQL database using TSQL script in single table

There are 1 answers

Related Questions in SQL-SERVER

Related Questions in T-SQL

Related Questions in TRIGGERS

Related Questions in AZURE-SQL-DATABASE

Related Questions in HISTORICAL-DB

Popular Questions

Popular Tags

Trending Questions