Updating of Fact tables

264 views Asked by At

I have a flatfile resources that were extracted into facts and dimensions. Some dimensions also comes from db resources. The transformation process is set on as needed basis (if there are new/updated from flatfiles). The problem is this, some data reference doesn't exist or match on the dimension based on db resources so the foreign key id value on the fact is set to default (zero if no matching data).

How can i perform an update on the facts if the said dimension (db resource) has been updated? What was the best practice/routine for this kind of scenario?

This is the sample illustration

Flatfile source                           product list (db source)
--------------------------------          ------------------------------
| product name | year | volume |          | prodcode |  name           |
--------------------------------          ------------------------------
| apple        | 2020 |  1000  |          | 001      | apple           |
| watermelon   | 2020 |  2000  |          | 002      | mango           |
--------------------------------          ------------------------------

Fact/Dimension

production_fact                           dim_product
-------------------------------          ---------------------------
| fk_product| fk_date| volume |          | id | prodcode |  name   |
-------------------------------          --------------------------|
| 2         |  d001  |  1000  |          |  1 |  n/a      | n/a    |
| 1         |  d001  |  2000  |          |  2 |  001      | apple  |
-------------------------------          |  3 |  002      | mango  |
                                         ---------------------------

If the product list will be updated (003 watermelon), should i replace the dim_product row#1 with the new value?

1

There are 1 answers

4
Nick.Mc On BEST ANSWER

Based on your example, this is the way it should work:

Note: I would expect prodcode to be be in flatfile, not product name. Is this really how your data looks? Anyway I will proceed.

First set of data arrives. Watermelon is in fact but not dimension.

Flatfile source                           product list (db source)
--------------------------------          ------------------------------
| product name | year | volume |          | prodcode |  name           |
--------------------------------          ------------------------------
| apple        | 2020 |  1000  |          | 001      | apple           |
| watermelon   | 2020 |  2000  |          | 002      | mango           |
--------------------------------          ------------------------------

We load a dimension record but it won't have any attribute values. (As I said I would normally expect the code to be in the fact input data but that's fine we'll go with description). This will of course require some logic to find dimensions that are in fact but not in dimensions.

production_fact                           dim_product
-------------------------------      ------------------------------------------------
| fk_product| fk_date| volume |      | id | prodcode |  name       | weight |colour |
-------------------------------      ------------------------------------------------
| 2         |  d001  |  1000  |      |  1 |  n/a      | n/a        | n/a    | n/a   |
| 4         |  d001  |  2000  |      |  2 |  001      | apple      | 200mg  | red   |
-------------------------------      |  3 |  002      | mango      | 400mg  | yellow|
                                     |  4 |  ?        | watermelon | ?      |   ?   |
                                     ------------------------------------------------

So we have dimension SK 4 which is a legitimate dimension record except it's missing a load of attributes.

Later, the dimension arrives. We know what it's meant to match on so we update the existing dimension which was missing data.

 product list (db source)
-----------------------------------------------
| prodcode |  name           | weight |colour |
--------------------------------------|-------|
| 003      | watermelon      | 1kg    | green |
-----------------------------------------------


------------------------------------------------
| id | prodcode |  name       | weight |colour |
------------------------------------------------
|  1 |  n/a      | n/a        | n/a    | n/a   |
|  2 |  001      | apple      | 200mg  | red   |
|  3 |  002      | mango      | 400mg  | yellow|
|  4 |  003      | watermelon | 1kg    | green |
------------------------------------------------

You want to avoid ever updating large facts. Updating smaller dimensions is a much better idea

BTW this is a type 1 dimension. You can take the same appriach with a SCD except that you wouldn't count the first version of the dimension, you'd just overwrite it.