Temporal Tables and Time Dimension in SQL Data Warehouse and Tableau

555 views Asked by At

I am in the beginning stages of redesigning our data warehouse and found temporal tables to be a pretty awesome way to do SCD type 4. This iteration of the data warehouse will feature a date dimension so that we can go to points in time comparisons using business logic (comparing specific days in fiscal quarters for example). The date dimension would have all sorts of cool business time marks like fiscal year/quarter/month and day in month/quarter/year plus all sorts of other start and end date flags signaling different business processes and 'seasons'.

The only problem I see is that I don't see how you can leverage the date dimension in a query without always using separate date fields and executing the query as 'for system_time all'. This is because you can't really do any manipulation with the date after the 'for system_time' statement.

Now I understand you could see the date to be a variable and you could do your manipulations when setting the variable. The problem I have is that Tableau doesn't allow you to execute sql scripts (like power BI does), just individual statements. So, I can't have a scenario where we set the date as a variable based on a query result. Because of this, I am not sure how I'd run a temporal query to say let's get the values on this particular day of a particular fiscal quarter. Hope that makes sense.

If I have to manually create the history tables using triggers I will, but I assume there is a better solution that my Googling has yet to find. Any ideas or articles you can point me to?


edit adding example:

Let's for example say we have a date dimension (dateTable) that has a few fields:

date
fiscalYear
fiscalDayOfYear

Now let's say I have a table (statusTable) with a few fields:

userid
status
date (FK to date dimension)

Now let's say I want to compare a specific user status' on the 20th day of the fiscal year from 2019 and 2020. Without using temporal tables it would look like:

select st.userid, st.status from statusTable st inner join dateTable dt on st.date = dt.date where dt.fiscalDayOfYear = '20' AND (dt.fiscalYear = '2020' or dt.fiscalYear = '2019')

Now if statusTable were a temporal table, it would look like this instead:

userid
status
systemtimeFrom (generated by temporal table)
systemtimeTo (generated by temporal table)

How would I do what I do in the query above with the temporal table? I've seen examples of people putting in field 'date' into the statusTable in addition to the system time fields. They have that value be automatically generated. So, in theory with that, I could join just like I did in the query above. The problem I see though is that if I query the temporal table without temporal constraints, it will only look at the latest status (since that's all that there is in the table). If I want to include the history AND the current table, I think I need to do a select that contains 'FOR SYSTEM_TIME ALL' which I feel removes the benefit of the temporal tables (since you can't do the cool time slicing temporal queries).

What I am looking to find out if there is a way to do the above query with a temporal table using some kind of magic I am not aware of. I'd love to be able to do something like (and I know this isn't valid SQL since you can only supply a date or variable in the as of clause):

select st.userid, st.status from statusTablest st for system_time as of (select date from datetable where dt.fiscalDayOfYear = '20' AND (dt.fiscalYear = '2020' or dt.fiscalYear = '2019'))

Sorry I know isn't a good example. I don't have much in the way of examples since I am designing this from scratch. I hope I get what I am trying to do across. Let me know if not and I can try to say it in a different way.

1

There are 1 answers

5
NickW On BEST ANSWER

thanks for the example and additional information. I realise that trying to come up with a real-world example for these sorts of questions is never easy but I need to point out that your example is not compatible with a dimensional model as you would never join dimensions together like that and use one dimension to filter another - dimensions are joined to, and are used to filter, fact tables.

However, with that caveat stated, to look at your example...

The non-temporal table you've shown is not going to support the type of query you want to run as it would require a record for every single date/user/status combination you might want to query for - which is obviously unsupportable. Instead you would need effective start and end dates to show the date period each combination of userid and status was applicable for. If you had these dates then your query would probably look something like this (an untested example as I don't have data to run it against):

select st.userid, st.status, st.ValidFrom, st.ValidTo 
from statusTable st 
inner join dateTable dt1 on st.ValidFrom = dt1.date 
inner join dateTable dt2 on st.ValidTo = dt2.date 
where 
(dt1.fiscalDayOfYear <= '20' AND dt2.fiscalDayOfYear >= '20') -- your fiscal day is within the coverage of the statusTable record
AND
(dt1.fiscalYear = '2020' or dt1.fiscalYear = '2019') -- limit to the 2 years in question
AND
(dt1.fiscalYear = dt2.fiscalYear); -- Attempt to filter to a single record.Probably incorrect if the ValidFrom and ValidTo dates for a record are in different years; required logic more complex than this 

So you basically have to have an SCD2-type table in order to be able to run these types of query.

A temporal table also has ValidFrom and ValidTo columns - so it is fundamentally the same as a traditional SCD2 table; the difference being that an SCD2 table has to be maintained using ETL logic whereas a temporal table is maintained automatically by the DB engine.

The logic for querying an SCD2 table and an equivalent temporal table is going to be almost identical. The main difference would be the use of the

FOR SYSTEM_TIME ...

clause - but all this is going to do is subset the data from the statusTable being used in the query and to do that you would need to be able to construct, and pass into the query, suitable start and end date parameters. If you can't do this (or the time period is so wide that producing the subset is unlikely to benefit the query) then you'd just use:

FOR SYSTEM_TIME ALL

So to hopefully clarify/answer some of the points you raised:

  • Temporal tables are a way of automatically storing history rather than having to manually maintain SCD2 tables - but the end result is effectively the same
  • Your statement "I think I need to do a select that contains 'FOR SYSTEM_TIME ALL' which I feel removes the benefit of the temporal tables (since you can't do the cool time slicing temporal queries)." indicates a possible misunderstanding regarding the point of temporal tables. Their benefit is that they hold history automatically and allow you to query it. The "the cool time slicing temporal queries" are enabled by your date dimension, not by your temporal tables, and therefore you still need to join your temporal tables to the date dimension in order to get this to work