schema designing in data warehousing and mining

141 views Asked by At

I am bit confused between fact and dimension tables and I am not able to clear my doubt . Thing is I have to design a schema where there is one keyword table . And corresponding to each and every keyword we have a date table and site table(that keyword is generated for which site) . Now having this scenario to work on I am very much confused regarding which table be assigned as fact and which one as dimension tables . Keyword table contains key_id and keyword name . Date table contains month , year and week . Site table contains name of site to which keyword belongs.Please suggest me architecture of this schema.

1

There are 1 answers

0
Nick.Mc On BEST ANSWER

What are you measuring?

Are you counting how many times a keyword is generated for each site? All three of those tables are dimensions (assuming your date table has every date in it regardless). You need another table which is your fact, which tells you how many times a keyword was generated for the day (or even the hour - you should start as low as possible)

To put it another way - for a given site, can a keyword been generated more than once in a week? Was it generated 10 times in a week? Here's your fact table record:

Date_SurrogateKey    Site_SurrogateKey    Keyword_SurrogateKey      GeneratedCount
1                    6                    7                         10

In this example, 1 joins to the primary key of your Date dimension, 6 joins to the primary key of your Site dimension and 7 joins to the primary key of your Keyword dimension.