Strategy to avoid joining fact tables

498 views Asked by At

I have three facts in my warehouse that can be related events in my relational db. They are PhoneContact, Appointment and Donation. A PhoneContact can result in an Appoinment and/or a Donation. I already have the Apppointment and Donation facts with their related dimensions and am now adding PhoneContact to my warehouse. The common dimension between all of these facts is the Donor dimension which describes who received the call and made the appointment and donation.

If a PhoneContact did result in an Appointment and/or Donation, I'd like to join those facts, but my understanding is that joining facts is a no-no. How would I best relate those Facts? Right now I can't think of anything better, so I'm considering putting AppointmentID and DonationID fields in my Phonecontacts fact.

More info: there are about 1.2M PhoneContacts per month but only about 100k of those result in an Appointment or Donation, so aside from not joining facts, just putting 1.1M NULLs per month into the table so I can get the 100K other events seems less than great.

1

There are 1 answers

0
Harsh Verma On

There seems to be a trade-of here between space and performance. It seems like joining would save space. On the other hand if we used a denormalized table (already joined), we might get better performance on complicated group by queries that require scanning entire tables.

Note that joining can be less expensive in some scenarios :

  • If you your tables are sorted based on the join key, joining will be less expensive (because we will use merge join algorithm).

  • If your queries yield small num of rows (eg. give me information about John), joining will be affordable with nice indices.

If you think your use case consistently falls out of the above categories and you can easily buy more disk space, creating an already joined table can help in increasing query speed.