Django treebeard what are differences between AL, NS, MP

4.6k views Asked by At

I'm trying to make a model for categorizing some objects.

I already tried using django-mptt to easily retrieve related categories, and now I'm searching different solutions to find the best one.

I can't find out though what are main differences between Materialized Path, Adjacency List and Nested Set. Wikipedia didn't give me a short answer, all I know is mptt is probably Nested Set...

Can anyone explain it to me in few words ?

1

There are 1 answers

0
Pedro Werneck On BEST ANSWER

It's easier to explain with examples than with a few words. Consider the sample tree, storing names:

William
    Jones
    Blake    
        Adams        
    Tyler    
Joseph
    Miller
    Flint

Materialized Path means each node stores its full path encoded. For instance, you can store its index using a dot as a delimiter

Name        Path
William     1
Jones       1.1
Blake       1.2
Adams       1.2.1
Tyler       1.3
Joseph      2
Miller      2.1
Flint       2.2

So, Adams knows its full name is Wiliam Blake Adams, because it has the 1.2.1 path, pointing to the 1 node at first level — William — to the 1.2 node at level 2 — Blake — and 1.2.1 node at level 3 — Adams.

Adjacency List means the tree is stored by keeping a link to some adjacent nodes. For instance, you can store who is the parent and who is the next sibling.

Name        Parent     Next
William     null       Joseph
Jones       William    Blake
Blake       William    Tyler
Adams       Blake      null
Tyler       William    null
Joseph      null       null    
Miller      Joseph     Flint
Flint       Joseph     null

Notice that it could be as simple as just storing the parent, if we don't need to keep the children of a node ordered. Now Adams can get all his ancestors recursively until null to find his full name.

Nested sets means you store each node with some index, usually a left and right value, assigned to each one as you traverse the tree in DFS order, so you know its descendants are within those values. Here's how the numbers would be assigned to the example tree:

  1 William 10
    2 Jones 3
    4 Blake 7   
        5 Adams 6
    8 Tyler 9
11 Joseph 16
    12 Miller 13 
    14 Flint 15

And it would be stored as:

Name        left   right
William     1      10
Jones       2      3
Blake       4      7
Adams       5      6
Tyler       8      9  
Joseph      11     16
Miller      12     13
Flint       14     15

So, now Adams can find its ancestors by querying who has a lower left AND a higher right value, and sort them by the left value.

Each model has its strengths and weaknesses. Choosing which one to use really depends on your application, the database you're using and what kind of operations you'll be doing most often. You can find a good comparison here.

The comparison mentions a fourth model that isn't very common (I don't know of any other implementation but my own) and very complicated to explain in a few words: Nested Interval via Matrix Encoding.

When you insert a new node in a nested set you have to re-enumerate everyone who is ahead of it in the traversal. The idea behind nested intervals is that there's an infinite number of rational numbers between any two integers, so you could store the new node as a fraction of its previous and next nodes. Storing and querying fractions can be troublesome, and this leads to the matrix encoding technique, which transforms those fractions in a 2x2 matrix and most operations can be done by some matrix algebra that isn't obvious at first sight but can be very efficient.

Treebeard has very straightforward implementations of each one of Materialized Path, Nested Sets and Adjacency Lists, with no redundancy. django-mptt actually uses a mix of Nested Sets and Adjacency Lists, since it also keeps a link to the parent and can use it to both query children more efficiently, and to rebuild the tree in case it gets messed up by someone.