I'm new to Kimball dimensional modelling, and I cannot solve how to model this case. I look forward to your help on this.
Business case: Our company leases our Equipment (i.e. trucks, excavators, ... also called Asset) and Employees (who operate the Asset) to different Clients for their Projects every day. Every morning, before moving out, we perform tests to make sure our Employees and Assets are up to working standards. We call those tests compliance test.
We collect data on the compliance tests, and now I want to model that data into dimensions and facts to facilitate for our Tableau report. The report will group on Clients and Projects, and it will show the number of passed / failed tests. Report can be filtered by date range. Example:
Filter: Date = 2023-01-01
| Client | Project | Total test passed | Total test failed |
|---|---|---|---|
| C1 | Hello | 10 | 0 |
| C1 | Goodbye | 9 | 1 |
When clicking on a report line, it will expand the details like this:
| Client | Project | Test type | Test sub-type | Name (asset / employee) | Status |
|---|---|---|---|---|---|
| C1 | Hello | Employee | Water test | Anna | Passed |
| C1 | Hello | Employee | Water test | Bob | Passed |
| C1 | Hello | Employee | Alcohol test | Bob | Passed |
| C1 | Hello | Asset | Oil test | Truck01 | Failed |
| C1 | Hello | ... | .... | ....... | ...... |
I'm having hard time thinking of a suitable model to facilitate column Name (asset / employee). Here's my original schema:
consolidated_fact_test_result
- Business process: compliance test taken
- Grain: one row per test per employee/asset per day
- Schema:
- date_id (connect to dim_date.date_id)
- project_id (connect to dim_project.project_id)
- test_subtype_id (connect to dim_test_subtype.test_subtype_id)
- employee_id (connect to dim_employee.employee_id)
- asset_id (connect to dim_asset.asset_id)
- test_passed_count
dim_date: standard date dimension
dim_project: one row per project per client. Schema:
- project_id
- project_name
- client_name
dim_test_subtype: one row per test subtype. Schema:
- test_subtype_id
- test_subtype_name
- test_type_name
dim_employee: one row per employee. Schema:
- employee_id
- employee_name
- ... other attributes
dim_asset: one row per asset. Schema:
- asset_id
- asset_name
- ... other attributes
Problem with this setup is that I cannot have a column containing both employee or asset name as required in the report. I was thinking about creating another hybrid dimension containing both employee and asset name, and put its key in the fact table like this.
fact
| date_id | project_id | test_subtype_id | hybrid_id | test_passed_count |
|---|---|---|---|---|
| 20230101 | 1 | 1 | 1 | 1 |
dim_hybrid
| hybrid_id | target_type | name |
|---|---|---|
| 1 | Employee | Anna |
| 2 | Employee | Bob |
| 3 | Asset | Truck01 |
That can solve the issue, but I wonder if that's the best practice? It seems messy. Also it's likely that hybrid dimension won't be reused.
You need to run two queries, one for assets and one for employees and then UNION the results together