I have a flatfile resources that were extracted into facts and dimensions. Some dimensions also comes from db resources. The transformation process is set on as needed basis (if there are new/updated from flatfiles). The problem is this, some data reference doesn't exist or match on the dimension based on db resources so the foreign key id value on the fact is set to default (zero if no matching data).
How can i perform an update on the facts if the said dimension (db resource) has been updated? What was the best practice/routine for this kind of scenario?
This is the sample illustration
Flatfile source product list (db source)
-------------------------------- ------------------------------
| product name | year | volume | | prodcode | name |
-------------------------------- ------------------------------
| apple | 2020 | 1000 | | 001 | apple |
| watermelon | 2020 | 2000 | | 002 | mango |
-------------------------------- ------------------------------
Fact/Dimension
production_fact dim_product
------------------------------- ---------------------------
| fk_product| fk_date| volume | | id | prodcode | name |
------------------------------- --------------------------|
| 2 | d001 | 1000 | | 1 | n/a | n/a |
| 1 | d001 | 2000 | | 2 | 001 | apple |
------------------------------- | 3 | 002 | mango |
---------------------------
If the product list will be updated (003 watermelon), should i replace the dim_product row#1 with the new value?
Based on your example, this is the way it should work:
Note: I would expect prodcode to be be in flatfile, not product name. Is this really how your data looks? Anyway I will proceed.
First set of data arrives. Watermelon is in fact but not dimension.
We load a dimension record but it won't have any attribute values. (As I said I would normally expect the code to be in the fact input data but that's fine we'll go with description). This will of course require some logic to find dimensions that are in fact but not in dimensions.
So we have dimension SK 4 which is a legitimate dimension record except it's missing a load of attributes.
Later, the dimension arrives. We know what it's meant to match on so we update the existing dimension which was missing data.
You want to avoid ever updating large facts. Updating smaller dimensions is a much better idea
BTW this is a type 1 dimension. You can take the same appriach with a SCD except that you wouldn't count the first version of the dimension, you'd just overwrite it.