I have a task fact table with following columns

  • Task ID (BIGINT)
  • User ID (BIGINT)
  • Created Timestamp (TIMESTAMP)
  • First assigned Timestamp (TIMESTAMP)
  • First assigned time taken (BIGINT)
  • First comment Timestamp (TIMESTAMP)
  • First comment time taken (BIGINT)
  • Closed Timestamp (TIMESTAMP)
  • Closed time taken (BIGINT)
  • First comment (VARCHAR(3000))

As you can see, these timestamps can get updated at any point in time after the task is created. This would mean this column will keep getting updated in redshift.

Q1: Do I create multiple tables with assignment, comment, close and join them when needed?

Following are the reports that is done

  • Average time taken for different metrics like first-assigned, first comment etc for time range x and y. The time range filters on created/first-assigned/first-comment etc.
  • Export data as csv between time x and y which has received First comment in the time range
  • The exported data would need all the data that is present here including the first comment

Q2: Is it okay to store the First comment here as it's a columnar storage. This text field is only used to export as csv.

1

There are 1 answers

0
NickW On

You wouldn't create multiple tables, you would create a single fact table with an FK to an Event Type Dimension e.g.

  • Task ID (BIGINT)
  • User ID (BIGINT)
  • Event ID (BIGINT)
  • Event Timestamp (TIMESTAMP)
  • Comment (VARCHAR(3000))

But if you did this you obviously couldn't also hold the metrics (e.g. First assigned time taken).

Whether it is better for the fact table to be insert-only (and so be faster) and metrics to be calculated at query time, or whether it is better for the fact table to do updates (and so be slower) but have the metrics pre-calculated and therefore the queries run faster - only you can know, because:

a) Only you can find out how much faster/slower each approach is

b) Only you know what "better" means for your specific circumstances