Given a recursive query that starts with a child, how can I eliminate sibling and parent rows?

1k views Asked by At

I have managed to build a recursive query which returns rows for the selected Id and all its children. This works absolutely fine for the ultimate parent, but I need it also to work correctly when the passed Id is that of one of the children, showing just the child and its children if any. Currently it still returns other child rows of the ultimate parent plus the passed child row displays twice...

As with a previous issue I have to do this using the sub-query format, because other TSQL based database engines than SQL Server may be used that do not support CTE or the WITH clause.

Desired Outcome:

Using Id 2, the correct data is returned: 2, 3, 4, 6, 7. Using Id 6, it should return 6, 7 only. Currently the query returns 6,3,4, 6,7.

Data:

ProjectId   ProjectName                             ParentId
1           Test Project                            -1
2           Test Project 2                          0
3           Test Project 2 Sub Project 1            2
4           Test Project 2 Sub Project 2            2
5           Test Project 3                          -1
6           Test Project 2 Sub Sub Project 1        3
7           Test Project 2 Sub Sub Sub Project 1    6

Query:

DECLARE @PROJECTID BIGINT = 2;

SELECT *
FROM            
(
    SELECT *
    FROM ProjectCostingProjects pcp
    WHERE pcp.[ProjectId] = @PROJECTID 
    UNION ALL
    SELECT  pcp2.*
    FROM    ProjectCostingProjects pcp2
    JOIN    ProjectCostingProjects pcp
    ON     pcp2.ParentID = pcp.ProjectId
);

Any advice or suggestions gratefully received.

2

There are 2 answers

3
devzero On BEST ANSWER

Here is an answer that uses a while loop instead.

Solution 1, using while loop and a temp table

/* Populating the temp table with the data */
DECLARE @recurse TABLE
(projectId INT,parent int);

INSERT INTO @recurse (projectId,parent) VALUES (1,-1);
INSERT INTO @recurse (projectId,parent) VALUES (2,-0);
INSERT INTO @recurse (projectId,parent) VALUES (3,2);
INSERT INTO @recurse (projectId,parent) VALUES (4,2);
INSERT INTO @recurse (projectId,parent) VALUES (5,-1);
INSERT INTO @recurse (projectId,parent) VALUES (6,3);
INSERT INTO @recurse (projectId,parent) VALUES (7,6);

DECLARE @recurse2 TABLE
(projectId INT,parent INT);

--Start by inserting the root element
INSERT INTO @recurse2 ( projectId, parent)
SELECT * FROM @recurse WHERE projectId = 2

--insert elements until all children have all children
WHILE EXISTS (SELECT * FROM @recurse WHERE parent IN (SELECT projectId FROM @recurse2) AND projectId NOT IN (SELECT projectId FROM @recurse2) )
BEGIN
    INSERT INTO @recurse2
    (
        projectId,
        parent
    )
    SELECT * FROM @recurse WHERE parent IN (SELECT projectId FROM @recurse2) AND projectId NOT IN (SELECT projectId FROM @recurse2)
END 

SELECT * FROM @recurse2

Solution 2 For performance you can create a intermidate table with the result from the while loop. This intermidiate table can be updated on an intervall, or as part of the creating an element in the main table. This could be either a part of your business logic or as a DB trigger.

Here is the code I would write if you wanted this as a scheduled job:

-- Populating the temp table with the data 
DECLARE @recurse TABLE
(projectId INT,parent int);

INSERT INTO @recurse (projectId,parent) VALUES (1,-1);
INSERT INTO @recurse (projectId,parent) VALUES (2,-0);
INSERT INTO @recurse (projectId,parent) VALUES (3,2);
INSERT INTO @recurse (projectId,parent) VALUES (4,2);
INSERT INTO @recurse (projectId,parent) VALUES (5,-1);
INSERT INTO @recurse (projectId,parent) VALUES (6,3);
INSERT INTO @recurse (projectId,parent) VALUES (7,6);

DECLARE @recurse2 TABLE
(projectId INT,parent INT, lvl int);

--Start by inserting all elements root at lvl 0
INSERT INTO @recurse2 ( projectId, parent, lvl ) SELECT projectId, parent, 0 FROM @recurse 
SELECT * FROM @recurse2

--insert elements until we have all parents for all elements
WHILE EXISTS (SELECT * FROM @recurse2 a WHERE lvl IN (SELECT TOP 1 lvl FROM @recurse2 b WHERE a.projectId = b.projectId ORDER BY lvl DESC) AND a.parent > 0 )
BEGIN
    --Insert the next parent for all elements that does not have a their top level parent allready
    INSERT INTO @recurse2 ( projectId, parent , lvl )
    SELECT  projectId, 
            (SELECT b.parent FROM @recurse b WHERE b.projectId = a.parent), 
            lvl + 1 
    FROM @recurse2 a WHERE lvl IN (SELECT TOP 1 lvl FROM @recurse2 b WHERE a.projectId = b.projectId ORDER BY lvl DESC) AND a.parent > 0 
END 

--Find all children to an element
SELECT * FROM @recurse2 WHERE parent = 2

The big advantage of solution #2 is that the performance should be REALY good for reads, probably even better than CTE's. Also it works equally well for reading from bottom to top, as top to bottom.

5
SqlZim On

So... the recursive common table expression is not merely a function of using union all, it is using the self reference of the common table expression in the second part of the union all. One does not simply replicate this recursive operation on another RDBMS by trying to put it a subquery/derived table.

If you want a recursive hierarchy traversal in SQL Server, the best option is to use the recursive common table expression.

declare @projectid bigint = 6;

;with cte as (
  select *
  from ProjectCostingProjects pcp
  where pcp.[ProjectId] = @projectid 
  union all
  select  pcp2.*
  from    ProjectCostingProjects pcp2
    inner join cte 
      on pcp2.Parentid = cte.ProjectId
)
select *
from cte;

rextester demo: http://rextester.com/XON59636

returns:

+-----------+--------------------------------------+----------+
| ProjectId |             ProjectName              | ParentId |
+-----------+--------------------------------------+----------+
|         6 | Test Project 2 Sub Sub Project 1     |        3 |
|         7 | Test Project 2 Sub Sub Sub Project 1 |        6 |
+-----------+--------------------------------------+----------+

The query you have now will (after adding an alias) return whatever row @ProjectID is, and 3,4,6,7. Because what you've written will return whichever row equals @ProjectID, and all rows that have a parent (that isn't 0 or -1), which are rows with ProjectId 3,4,6,7.

rextester demo with @ProjectId = null : http://rextester.com/VQU71307