Joining date dimension multiple times? - Kimball's book on Data warehouse and Dimension Modeling

1.1k views Asked by At

I'm reading Ralph Kimball's book on Data warehouse and Dimension Modeling and in chapter 6, there is this part about dimension role playing.

Sometimes you discover other dates associated with each transaction, such as the requested ship date for the order. Each of the dates should be a foreign key in the fact table... However, you cannot simply join these two foreign keys to the same date dimension table. SQL would interpret this two-way simultaneous join as requiring both the dates to be identical, which isn’t very likely.

And I am not sure I understand the two last sentences. Does it mean you cannot join the date dimension multiple times if both dates in fact table have different value? Why?

2

There are 2 answers

0
NickW On

It’s not very well expressed but all that it is saying is that you need to alias the date dimension if you are going to join to it multiple times from different FKs in the Fact table.

This is true of any SQL statement where 2 tables are joined together more than once, it’s not specific to dimensional modelling.

0
Marmite Bomber On

The (well) hidden message here is that you need multiple joins - one for each dimension role

Say you have a fact table with entry date entry_d and booking date booking_d

This would be wrong and is probably meant in the text

select * from fact
join time_dim tim
on fact.entry_d = tim.date_d and
   fact.book_d = tim.date_d;

And this is right using two independent joins to the time dimension

select * from fact
join time_dim entr on fact.entry_d = entr.date_d  
join time_dim book on fact.book_d = book.date_d;

Note also that if you use an inner join as in the example above you shoud make some elementary validaton and clenaing of both dates. The point is: you should recognise fact rows with invalid dates (not covered in time dimension or NULL) and handle them propertly - not discard them silently in the join.

For non trivial setups, especially when the fact table is partitioned on a time column, you choose the native DATE format and not some surrogate key as for other dimensions. (This is a practical rule - not covered in the theory)