I'm looking for feedback and advice for using CONNECT BY in a hierarchical query.
Suppose I have a table that looks like this:
| id | alt_id | parent_id | parent_ext_id |
|---|---|---|---|
| A | A1 | null | null |
| B | B1 | A | null |
| C | C1 | null | A1L |
| D | null | A | null |
| E | E1 | null | B1L |
| F | null | B | null |
The "id" column is the PK and never null. "alt_id" is an "alternate id", unique and can be null. Sometimes a record refers to its parent "id" via "parent_id" and sometimes to its parent "alt_id" indirectly through a lookup table via "parent_ext_id". Never both.
Lookup table:
| ext_id | alt_id |
|---|---|
| A1L | A1 |
| B1L | B1 |
Suppose I'm given ext_id 'A1L' and I want its descendents. E.g.
| depth | descendent_id | parent |
|---|---|---|
| 1 | B | A |
| 1 | C | A |
| 1 | D | A |
| 2 | E | B |
| 2 | F | B |
Here's an example query, which works, but I'm wondering if the CONNECT BY clause is the best it can be because I've had some performance issues with it when I try it with real data...
WITH hierarchy_table AS (
SELECT 'A' id, 'A1' alt_id, null parent_id, null parent_ext_id FROM dual UNION
SELECT 'B' id, 'B1' alt_id, 'A' parent_id, null parent_ext_id FROM dual UNION
SELECT 'C' id, 'C1' alt_id, null parent_id, 'A1L' parent_ext_id FROM dual UNION
SELECT 'D' id, null alt_id, 'A' parent_id, null parent_ext_id FROM dual UNION
SELECT 'E' id, 'E1' alt_id, null parent_id, 'B1L' parent_ext_id FROM dual UNION
SELECT 'F' id, null alt_id, 'B' parent_id, null parent_ext_id FROM dual
),
lookup_table AS (
SELECT 'A1L' ext_id, 'A1' alt_id FROM dual UNION
SELECT 'B1L' ext_id, 'B1' alt_id FROM dual
),
root_node AS (
SELECT * FROM hierarchy_table
WHERE alt_id = (SELECT alt_id FROM lookup_table WHERE ext_id='A1L')
)
SELECT LEVEL AS depth, id,
CASE
WHEN parent_id IS NOT NULL THEN parent_id
ELSE (
SELECT id FROM hierarchy_table WHERE alt_id = (
SELECT alt_id FROM lookup_table WHERE ext_id=ht.parent_ext_id)
)
END AS parent
FROM hierarchy_table ht
-- Start with the children
START WITH parent_ext_id = 'A1L' OR parent_id = (SELECT id FROM root_node)
CONNECT BY NOCYCLE (
(parent_id IS NOT NULL AND PRIOR id = parent_id)
OR
(parent_ext_id IS NOT NULL AND PRIOR alt_id = (
SELECT alt_id FROM lookup_table WHERE ext_id=parent_ext_id))
) ORDER BY depth, id
When I leave out either of the two conditions in the CONNECT BY clause I have no performance issues, but of course I don't get the results I want.
Try by creating an intermediary view (here the CTE vhierarchy_table) LEFT JOINing the lookup_table and back the hierarchy_table: