PostgreSQL Materialized Path / Ltree to hierarchical JSON-object

5.2k views Asked by At

I have this materialized path tree structure built using PostgreSQL's ltree module.

  • id1
  • id1.id2
  • id1.id2.id3
  • id1.id2.id5
  • id1.id2.id3.id4 ... etc

I can of course easily use ltree to get all nodes from the entire tree or from a specific path/subpath, but when I do that, naturally what I get is a lot of rows (which equals to an array/slice of nodes in the end.. Golang/whatever programming language you use)

What I'm after is to fetch the tree - ideally from a certain start and ending path/point - as a hieracical JSON tree object like etc

{
  "id": 1,
  "path": "1",
  "name": "root",
  "children": [
    {
      "id": 2,
      "path": "1.2",
      "name": "Node 2",
      "children": [
        {
          "id": 3,
          "path": "1.2.3",
          "name": "Node 3",
          "children": [
            {
              "id": 4,
              "path": "1.2.3.4",
              "name": "Node 4",
              "children": [

              ]
            }
          ]
        },
        {
          "id": 5,
          "path": "1.2.5",
          "name": "Node 5",
          "children": [

          ]
        }
      ]
    }
  ]
}

I know from a linear (non-hiearchical) row/array/slice resultset I can of course in Golang explode the path and make the necessary business logic there to create this json, but it'll certainly be MUCH much better if there's a handy way of achieving this with PostgreSQL directly.

So how would you in PostgreSQL output an ltree tree structure to json - potentionally from a starting to ending path?

If you don't know ltree, I guess the question could be generalized more to "Materalized path tree to hierachical json"

Also I'm playing with the thought of adding a parent_id on all nodes in addition to the ltree path, since at least then I would be able to use recursive calls using that id to fetch the json I guess... also I've thought about putting a trigger on that parent_id to manage the path (keep it updated) based on when a change in parent id happens - I know it's another question, but perhaps you could tell me your opinion as well, about this?

I hope some genius can help me with this. :)

For your convenience here's a sample create script you can use to save time:

CREATE TABLE node
(
  id bigserial NOT NULL,
  path ltree NOT NULL,
  name character varying(255),
  CONSTRAINT node_pkey PRIMARY KEY (id)
);

INSERT INTO node (path,name) 
VALUES ('1','root');

INSERT INTO node (path,name) 
VALUES ('1.2','Node 1');

INSERT INTO node (path,name) 
VALUES ('1.2.3','Node 3');

INSERT INTO node (path,name) 
VALUES ('1.2.3.4','Node 4');

INSERT INTO node (path,name) 
VALUES ('1.2.5','Node 5');
2

There are 2 answers

2
Dac0d3r On

I was able to find and slightly change it to work with ltree's materialized paths instead of parent ids like often used on adjacency tree structures.

While I still hope for a better solution, this I guess will get the job done.

I kinda feel I have to add the parent_id in addition to the ltree path, since this is of course not any way near as fast as referencing parent id's.

Well credits goes to this guy's solution, and here's my slightly modified code using ltree's subpath, ltree2text and nlevel to achieve the exact same:

WITH RECURSIVE c AS (
    SELECT *, 1 as lvl
    FROM node
    WHERE id=1
  UNION ALL
    SELECT node.*, c.lvl + 1 as lvl
    FROM node
    JOIN c ON ltree2text(subpath(node.path,nlevel(node.path)-2 ,nlevel(node.path))) = CONCAT(subpath(c.path,nlevel(c.path)-1,nlevel(c.path)),'.',node.id)
),
maxlvl AS (
  SELECT max(lvl) maxlvl FROM c
),
j AS (
    SELECT c.*, json '[]' children
    FROM c, maxlvl
    WHERE lvl = maxlvl
  UNION ALL
    SELECT (c).*, json_agg(j) children FROM (
      SELECT c, j
      FROM j
      JOIN c ON ltree2text(subpath(j.path,nlevel(j.path)-2,nlevel(j.path))) = CONCAT(subpath(c.path,nlevel(c.path)-1,nlevel(c.path)),'.',j.id)
    ) v
    GROUP BY v.c
)
SELECT row_to_json(j)::text json_tree
FROM j
WHERE lvl = 1;

There is a big problem with this solution though, so far.. see the image below for the error (Node 5 is missing):

Node 5 is missing from the JSON object

0
Herman J. Radtke III On

The reason node 5 does not show up is because it is a leaf node that is not at the max level and the subsequent join on condition excluded it.

The true base case for recursing through a tree is a node that is a leaf. By starting at the max level, that implicitly selects all leaf nodes but misses leaf nodes that occur at a lower level. Here is what we want to do in pseudo code:

for each node:
   if node is leaf, then return empty array
   else return the aggregated children

I found this hard to express in SQL though. Instead, I used the same strategy of starting from the max level and then moving up one level at a time. However, I added some code to handle the leaf node base case when I was above the max level.

Here is what I came up with:

WITH RECURSIVE c AS (
  SELECT
    name,
    path,
    nlevel(path) AS lvl
  FROM node
),
maxlvl AS (
  SELECT max(lvl) maxlvl FROM c
),
j AS (
  SELECT
    c.*,
    json '[]' AS children
  FROM c, maxlvl
  WHERE lvl = maxlvl
  UNION ALL
  SELECT
    (c).*,
    CASE
      WHEN COUNT(j) > 0 -- check if returned record is null
        THEN json_agg(j) -- if not null, aggregate
      ELSE json '[]' -- if null, then we are a leaf, so return empty array
    END AS children
  FROM (
    SELECT
      c,
      CASE
        WHEN c.path = subpath(j.path, 0, nlevel(j.path) - 1) -- c is a parent of the child
          THEN j
        ELSE NULL -- if c is not a parent, return NULL to trigger base case
      END AS j
    FROM j
    JOIN c ON c.lvl = j.lvl - 1
  ) AS v
  GROUP BY v.c
)
SELECT row_to_json(j)::text AS json_tree
FROM j
WHERE lvl = 1;

My solution only uses the path (and the derived level from the path). It does not need name or id to properly recurse.

Here is the result I get (I included a node 6 to make sure I handled multiple leaf nodes at the same level):

{
  "name": "root",
  "path": "1",
  "lvl": 1,
  "children": [
    {
      "name": "Node 1",
      "path": "1.2",
      "lvl": 2,
      "children": [
        {
          "name": "Node 5",
          "path": "1.2.5",
          "lvl": 3,
          "children": []
        },
        {
          "name": "Node 3",
          "path": "1.2.3",
          "lvl": 3,
          "children": [
            {
              "name": "Node 6",
              "path": "1.2.3.4",
              "lvl": 4,
              "children": []
            },
            {
              "name": "Node 4",
              "path": "1.2.3.4",
              "lvl": 4,
              "children": []
            }
          ]
        }
      ]
    }
  ]
}