I'm trying to aggregate the below given table 1 to table 2 with Excel power queries.
The goal is to merge continuous time intervals of the same group into a single row. For direct overlappings like event 5 and 6 this was quite easy. But this approach only merged event 1 and 2 and 2 and 3 resulting in two entries (see Table 1b).
The problem is the "transitive" dependence of e.g. event 1 with event 3 via event 2. This dependence can span more than 3 rows. So the transitive closure must be determined.
Programming it could iteratively apply the current solution until no more changes occur. But how to do it in power query?
Table 1 (original):
| Event ID | Group | Start | End |
|---|---|---|---|
| 1 | A | 20.01.2022 12:00:00 | 20.01.2022 12:02:00 |
| 2 | A | 20.01.2022 12:01:00 | 20.01.2022 12:04:20 |
| 3 | A | 20.01.2022 12:03:10 | 20.01.2022 12:06:00 |
| 4 | A | 20.01.2022 12:08:00 | 20.01.2022 12:10:00 |
| 5 | B | 20.01.2022 12:00:50 | 20.01.2022 12:02:00 |
| 6 | B | 20.01.2022 12:01:00 | 20.01.2022 12:05:00 |
| 7 | B | 20.01.2022 12:06:00 | 20.01.2022 12:11:00 |
Table 1b (current intermediate solution):
| Event ID | Group | Start | End |
|---|---|---|---|
| 1 | A | 20.01.2022 12:00:00 | 20.01.2022 12:04:20 |
| 2 | A | 20.01.2022 12:01:00 | 20.01.2022 12:06:00 |
| 4 | A | 20.01.2022 12:08:00 | 20.01.2022 12:10:00 |
| 5 | B | 20.01.2022 12:00:50 | 20.01.2022 12:05:00 |
| 7 | B | 20.01.2022 12:06:00 | 20.01.2022 12:11:00 |
Table 2 (desired result):
| Event ID | Group | Start | End |
|---|---|---|---|
| 1 | A | 20.01.2022 12:00:00 | 20.01.2022 12:06:00 |
| 4 | A | 20.01.2022 12:08:00 | 20.01.2022 12:10:00 |
| 5 | B | 20.01.2022 12:00:50 | 20.01.2022 12:05:00 |
| 7 | B | 20.01.2022 12:06:00 | 20.01.2022 12:11:00 |
Edit
Example which wasn't aggregated fully with the provided solution:
| Event ID | Group | Start | End |
|---|---|---|---|
| 1 | A | 20.01.2022 12:02:12 | 20.01.2022 12:05:34 |
| 2 | A | 20.01.2022 12:02:54 | 20.01.2022 12:05:37 |
| 3 | A | 20.01.2022 12:05:36 | 20.01.2022 12:05:49 |
| 4 | A | 20.01.2022 12:05:45 | 20.01.2022 12:07:22 |
| 5 | A | 20.01.2022 12:06:03 | 20.01.2022 12:06:10 |
results in (previous solution):
| Event ID | Group | Start | End |
|---|---|---|---|
| 1 | A | 20.01.2022 12:02:12 | 20.01.2022 12:07:22 |
| 5 | A | 20.01.2022 12:02:54 | 20.01.2022 12:07:22 |
results in (accepted answer):
| Event ID | Group | Start | End |
|---|---|---|---|
| 1 | A | 20.01.2022 12:02:12 | 20.01.2022 12:07:22 |
EDIT See better later answer at Aggregate overlapping time intervals, calculate outage times, improve performance
Step 1: Create separate query, name it process, close and load it before proceeding
Step 2: Code for table that uses above function:
The function calls itself until no more changes can be made, so should work as deep as needed