Here's the situation, in the source database we have more than 600K active rows for a dimension but in reality the business only uses 100 of them.
Unfortunately the list of values that they might use is not known and we can't manually filter on those values to populate the dimension table.
I was thinking, what if I include the dimension columns for that table in the fact table and then when we send that to staging area, just seperate it from the fact and send it to it's own table.
This way, I will only capture the values that are actually used.
P.S. They have a search function in the application that help users navigate through 600K values. it's not like a drop-down field !
Do you have a better recommendation?
Yes - you could build the Dimension from the fact staging table. A couple of things to consider: