SQL Hierarchy Fill Down

181 views Asked by At

I have an entity table that list row per entity, the rows are ordered (using id column) to show an implied hierarchy. Note for row 'B2', 'C1' does not get filled down because that is different second level entity. The same with 'A3' last row.

I need to transform, Table_Entity (see sql definition below):

id Level_1 Level_2 Level_3 Level_4
0 1 A1 null null null
1 2 null B1 null null
2 3 null null C1 null
3 4 null B2 null null
4 5 A2 null null null
5 6 null B3 null null
6 7 null null C2 null
7 8 null null C3 null
8 9 null null null D1
9 10 A3 null null null

to, Table_Entity_OUTPUT (see SQL definition below):

id Level_1 Level_2 Level_3 Level_4
0 1 A1 null null null
1 2 A1 B1 null null
2 3 A1 B1 C1 null
3 4 A1 B2 null null
4 5 A2 null null null
5 6 A2 B3 null null
6 7 A2 B3 C2 null
7 8 A2 B3 C3 null
8 9 A2 B3 C3 D1
9 10 A3 null null null

SQL code for input and expected output tables

CREATE TABLE Table_Entity (
  id int,
  Level_1 Varchar(10),
  Level_2 Varchar(10),
  Level_3 Varchar(10),
  Level_4 Varchar(10)
);

INSERT INTO Table_Entity(id, Level_1, Level_2, Level_3, Level_4) VALUES (1,'A1', null, null, null);
INSERT INTO Table_Entity(id,Level_1, Level_2, Level_3, Level_4) VALUES (2, null, 'B1', null, null);
INSERT INTO Table_Entity(id, Level_1, Level_2, Level_3, Level_4) VALUES (3, null, null, 'C1', null);
INSERT INTO Table_Entity(id, Level_1, Level_2, Level_3, Level_4) VALUES (4, null, 'B2', null, null);
INSERT INTO Table_Entity(id, Level_1, Level_2, Level_3, Level_4) VALUES (5, 'A2', null, null, null);
INSERT INTO Table_Entity(id, Level_1, Level_2, Level_3, Level_4) VALUES (6, null, 'B3', null, null);
INSERT INTO Table_Entity(id, Level_1, Level_2, Level_3, Level_4) VALUES (7, null, null, 'C2', null);
INSERT INTO Table_Entity(id, Level_1, Level_2, Level_3, Level_4) VALUES (8, null, null, 'C3', null);
INSERT INTO Table_Entity(id, Level_1, Level_2, Level_3, Level_4) VALUES (9, null, null, null, 'D1');
INSERT INTO Table_Entity(id, Level_1, Level_2, Level_3, Level_4) VALUES (10, 'A3', null, null, null);

CREATE TABLE Table_Entity_OUTPUT (
  id int, 
  Level_1 Varchar(10),
  Level_2 Varchar(10),
  Level_3 Varchar(10),
  Level_4 Varchar(10)
);

INSERT INTO Table_Entity_OUTPUT(id, Level_1, Level_2, Level_3, Level_4) VALUES (1, 'A1', null, null, null);
INSERT INTO Table_Entity_OUTPUT(id, Level_1, Level_2, Level_3, Level_4) VALUES (2, 'A1', 'B1', null, null);
INSERT INTO Table_Entity_OUTPUT(id, Level_1, Level_2, Level_3, Level_4) VALUES (3, 'A1', 'B1', 'C1', null);
INSERT INTO Table_Entity_OUTPUT(id, Level_1, Level_2, Level_3, Level_4) VALUES (4, 'A1', 'B2', null, null);
INSERT INTO Table_Entity_OUTPUT(id, Level_1, Level_2, Level_3, Level_4) VALUES (5, 'A2', null, null, null);
INSERT INTO Table_Entity_OUTPUT(id, Level_1, Level_2, Level_3, Level_4) VALUES (6, 'A2', 'B3', null, null);
INSERT INTO Table_Entity_OUTPUT(id, Level_1, Level_2, Level_3, Level_4) VALUES (7, 'A2', 'B3', 'C2', null);
INSERT INTO Table_Entity_OUTPUT(id, Level_1, Level_2, Level_3, Level_4) VALUES (8, 'A2', 'B3', 'C3', null);
INSERT INTO Table_Entity_OUTPUT(id, Level_1, Level_2, Level_3, Level_4) VALUES (9, 'A2', 'B3', 'C3', 'D1');
INSERT INTO Table_Entity_OUTPUT(id, Level_1, Level_2, Level_3, Level_4) VALUES (10, 'A3', null, null, null);
4

There are 4 answers

0
MatBailie On BEST ANSWER

In the absense of SKIP NULLS / IGNORE NULLS, my first thought is to create groups based on incremental changes in the relevant column AND its parent columns. Then use MAX(value) OVER (PARTITION BY group_id).

WITH
  ids AS
(
  SELECT
    *,
    COUNT(Level_1) OVER (ORDER BY id)   AS Level_1_id,
    COUNT(Level_2) OVER (ORDER BY id)   AS Level_2_id,
    COUNT(Level_3) OVER (ORDER BY id)   AS Level_3_id,
    COUNT(Level_4) OVER (ORDER BY id)   AS Level_4_id
  FROM
    Table_Entity
)
SELECT
  *,
  MAX(Level_1) OVER (PARTITION BY Level_1_id),
  MAX(Level_2) OVER (PARTITION BY Level_1_id, Level_2_id),
  MAX(Level_3) OVER (PARTITION BY Level_1_id, Level_2_id, Level_3_id),
  MAX(Level_4) OVER (PARTITION BY Level_1_id, Level_2_id, Level_3_id, Level_4_id)
FROM
  ids

id Level_1 Level_2 Level_3 Level_4 Level_1_id Level_2_id Level_3_id Level_4_id (No column name) (No column name) (No column name) (No column name)
1 A1 null null null 1 0 0 0 A1 null null null
2 null B1 null null 1 1 0 0 A1 B1 null null
3 null null C1 null 1 1 1 0 A1 B1 C1 null
4 null B2 null null 1 2 1 0 A1 B2 null null
5 A2 null null null 2 2 1 0 A2 null null null
6 null B3 null null 2 3 1 0 A2 B3 null null
7 null null C2 null 2 3 2 0 A2 B3 C2 null
8 null null C3 null 2 3 3 0 A2 B3 C3 null
9 null null null D1 2 3 3 1 A2 B3 C3 D1
10 A3 null null null 3 3 3 1 A3 null null null

fiddle

2
Error_2646 On

It feels like there is something clever to be done with pivot or cross apply, but here's a sledgehammer approach.

Rationale:

First thing is to group the hierarchies. You can do this by deriving a hierarchy start indictor, then the cumulative sum of that gives you a sort of hierarchy id.

Then you define a level for the record, because as inferred from sample records 3&4 we only want to push down, never up.

Then a sequence of inline correlated subqueries to get the imputed value for the hierarchy group, capped at the record level. You could do this with an appropriated ranged window function as well.

warning - The max() here implicit assumes you'll have some ordering where the maximum is a sensible thing, as indicated by your sample data. If this is not the case it'd be pretty simple to tweak to make it explicit.

https://dbfiddle.uk/rdoM02oI

CREATE TABLE Table_Entity (
  id int,
  Level_1 Varchar(10),
  Level_2 Varchar(10),
  Level_3 Varchar(10),
  Level_4 Varchar(10)
);

INSERT INTO Table_Entity(id, Level_1, Level_2, Level_3, Level_4) VALUES (1,'A1', null, null, null);
INSERT INTO Table_Entity(id,Level_1, Level_2, Level_3, Level_4) VALUES (2, null, 'B1', null, null);
INSERT INTO Table_Entity(id, Level_1, Level_2, Level_3, Level_4) VALUES (3, null, null, 'C1', null);
INSERT INTO Table_Entity(id, Level_1, Level_2, Level_3, Level_4) VALUES (4, null, 'B2', null, null);
INSERT INTO Table_Entity(id, Level_1, Level_2, Level_3, Level_4) VALUES (5, 'A2', null, null, null);
INSERT INTO Table_Entity(id, Level_1, Level_2, Level_3, Level_4) VALUES (6, null, 'B3', null, null);
INSERT INTO Table_Entity(id, Level_1, Level_2, Level_3, Level_4) VALUES (7, null, null, 'C2', null);
INSERT INTO Table_Entity(id, Level_1, Level_2, Level_3, Level_4) VALUES (8, null, null, 'C3', null);
INSERT INTO Table_Entity(id, Level_1, Level_2, Level_3, Level_4) VALUES (9, null, null, null, 'D1');
INSERT INTO Table_Entity(id, Level_1, Level_2, Level_3, Level_4) VALUES (10, 'A3', null, null, null);

CREATE TABLE Table_Entity_OUTPUT (
  id int, 
  Level_1 Varchar(10),
  Level_2 Varchar(10),
  Level_3 Varchar(10),
  Level_4 Varchar(10)
);

INSERT INTO Table_Entity_OUTPUT(id, Level_1, Level_2, Level_3, Level_4) VALUES (1, 'A1', null, null, null);
INSERT INTO Table_Entity_OUTPUT(id, Level_1, Level_2, Level_3, Level_4) VALUES (2, 'A1', 'B1', null, null);
INSERT INTO Table_Entity_OUTPUT(id, Level_1, Level_2, Level_3, Level_4) VALUES (3, 'A1', 'B1', 'C1', null);
INSERT INTO Table_Entity_OUTPUT(id, Level_1, Level_2, Level_3, Level_4) VALUES (4, 'A1', 'B2', null, null);
INSERT INTO Table_Entity_OUTPUT(id, Level_1, Level_2, Level_3, Level_4) VALUES (5, 'A2', null, null, null);
INSERT INTO Table_Entity_OUTPUT(id, Level_1, Level_2, Level_3, Level_4) VALUES (6, 'A2', 'B3', null, null);
INSERT INTO Table_Entity_OUTPUT(id, Level_1, Level_2, Level_3, Level_4) VALUES (7, 'A2', 'B3', 'C2', null);
INSERT INTO Table_Entity_OUTPUT(id, Level_1, Level_2, Level_3, Level_4) VALUES (8, 'A2', 'B3', 'C3', null);
INSERT INTO Table_Entity_OUTPUT(id, Level_1, Level_2, Level_3, Level_4) VALUES (9, 'A2', 'B3', 'C3', 'D1');
INSERT INTO Table_Entity_OUTPUT(id, Level_1, Level_2, Level_3, Level_4) VALUES (10, 'A3', null, null, null);

with group_hierarchies as
  (
    select
    id,
    level_1,
    level_2,
    level_3,
    level_4,
    case when level_4 is not null then 4
         when level_3 is not null then 3
         when level_2 is not null then 2
         else 1
     end as record_level,
    sum(start_new_hierarchy) over ( 
       order by id
       rows between unbounded preceding and current row) as h_group
    from (
    select case when level_1 is not null 
                 and coalesce(level_2,level_3,level_4) is null then 1
                else 0 
            end as start_new_hierarchy,
           * 
      from Table_Entity
    ) dummy_alias
  )
select id,
       (
         select level_1
           from group_hierarchies t2
          where t1.h_group = t2.h_group
            and t2.record_level = 1
       ) as level_1_impute,
       case when t1.record_level >= 2 then
       (
         select max(level_2)
           from group_hierarchies t2
          where t1.h_group = t2.h_group
            and t2.record_level = 2
            and t1.id >= t2.id
          group
             by t2.h_group
       ) end as level_2_impute,
       case when t1.record_level >= 3 then
       (
         select max(level_3)
           from group_hierarchies t2
          where t1.h_group = t2.h_group
            and t2.record_level = 3
            and t1.id >= t2.id
          group
             by t2.h_group
       ) end as level_3_impute,
       case when t1.record_level >= 4 then
       (
         select max(level_4)
           from group_hierarchies t2
          where t1.h_group = t2.h_group
            and t2.record_level = 4
            and t1.id >= t2.id
          group
             by t2.h_group
       ) end as level_4_impute
  from group_hierarchies t1;
id level_1_impute level_2_impute level_3_impute level_4_impute
1 A1 null null null
2 A1 B1 null null
3 A1 B1 C1 null
4 A1 B2 null null
5 A2 null null null
6 A2 B3 null null
7 A2 B3 C2 null
8 A2 B3 C3 null
9 A2 B3 C3 D1
10 A3 null null null
4
ValNik On

Solution with last_value ... ignore nulls an elegant solution, but available only from version 2022.

The idea of the solution is sort only by Id (or use index on Id).

Tested for SQL Server from 2014 to 2022.

with t1 as(
select *
  ,max(case when Level_1 is null then 0 else rn end)over(order by id) L1_p
  ,max(case when Level_2 is null then 0 else rn end)over(order by id) L2_p
  ,max(case when Level_3 is null then 0 else rn end)over(order by id) L3_p
  ,max(case when Level_4 is null then 0 else rn end)over(order by id) L4_p
from  ( select *,row_number()over(order by id) rn   from Table_Entity )t0
)
,t2 as(
select *
  ,L1_p L1_px
  ,case when L2_p<L1_p then L1_p 
   else L2_p 
   end L2_px
  ,case when L3_p<L2_p then L2_p 
        when L3_p<L1_p then L1_p 
   else L3_p end L3_px
  ,case when L4_p<L3_p then L3_p 
        when L4_p<L2_p then L2_p 
        when L4_p<L1_p then L1_p
   else L4_p end L4_px
from t1
)
,t3 as (
select *
  ,lag(Level_1,rn-L1_px)over(order by id) L1_h
  ,lag(Level_2,rn-L2_px)over(order by id) L2_h
  ,lag(Level_3,rn-L3_px)over(order by id) L3_h
  ,lag(Level_4,rn-L4_px)over(order by id) L4_h
from t2
)
select * from t3;
id Level_1 Level_2 Level_3 Level_4 L1_h L2_h L3_h L4_h L1_px L2_px L3_px L4_px rn
1 A1 null null null A1 null null null 1 1 1 1 1
2 null B1 null null A1 B1 null null 1 2 1 1 2
3 null null C1 null A1 B1 C1 null 1 2 3 1 3
4 null B2 null null A1 B2 null null 1 4 4 1 4
5 A2 null null null A2 null null null 5 5 5 5 5
6 null B3 null null A2 B3 null null 5 6 5 5 6
7 null null C2 null A2 B3 C2 null 5 6 7 5 7
8 null null C3 null A2 B3 C3 null 5 6 8 5 8
9 null null null D1 A2 B3 C3 D1 5 6 8 9 9
10 A3 null null null A3 null null null 10 10 10 10 10

Fiddle

0
MatBailie On

@valnik answer, with optimisations...

WITH
  t0 AS
(
  SELECT
    *
   ,ROW_NUMBER() OVER (ORDER BY id)            AS rn
   ,CASE WHEN Level_1 IS NOT NULL THEN 1
         WHEN Level_2 IS NOT NULL THEN 2
         WHEN Level_3 IS NOT NULL THEN 3
         WHEN Level_4 IS NOT NULL THEN 4 END   AS level_id
  FROM
    Table_Entity
),
  t1 AS
(
  SELECT
    *
   ,MAX(l1_row) OVER (ORDER BY id) AS L1_p
   ,MAX(l2_row) OVER (ORDER BY id) AS L2_p
   ,MAX(l3_row) OVER (ORDER BY id) AS L3_p
  from
    t0
  CROSS APPLY
  (
    SELECT
      CASE WHEN level_id <= 1 THEN rn ELSE 0 END
     ,CASE WHEN level_id <= 2 THEN rn ELSE 0 END
     ,CASE WHEN level_id <= 3 THEN rn ELSE 0 END
  )
    AS source(l1_row, l2_row, l3_row)
)
SELECT
   *
  ,LAG(Level_1,rn-L1_p) OVER (ORDER BY id) L1_h
  ,LAG(Level_2,rn-L2_p) OVER (ORDER BY id) L2_h
  ,LAG(Level_3,rn-L3_p) OVER (ORDER BY id) L3_h 
  ,Level_4
FROM
  t1
ORDER BY
  id
id Level_1 Level_2 Level_3 Level_4 rn level_id l1_row l2_row l3_row L1_p L2_p L3_p L1_h L2_h L3_h Level_4
1 A1 null null null 1 1 1 1 1 1 1 1 A1 null null null
2 null B1 null null 2 2 0 2 2 1 2 2 A1 B1 null null
3 null null C1 null 3 3 0 0 3 1 2 3 A1 B1 C1 null
4 null B2 null null 4 2 0 4 4 1 4 4 A1 B2 null null
5 A2 null null null 5 1 5 5 5 5 5 5 A2 null null null
6 null B3 null null 6 2 0 6 6 5 6 6 A2 B3 null null
7 null null C2 null 7 3 0 0 7 5 6 7 A2 B3 C2 null
8 null null C3 null 8 3 0 0 8 5 6 8 A2 B3 C3 null
9 null null null D1 9 4 0 0 0 5 6 8 A2 B3 C3 D1
10 A3 null null null 10 1 10 10 10 10 10 10 A3 null null null

fiddle