I've been always concerned with the space usage of a radix tree, but I didn't find any helpful discussions on this.
Now let's say we have a radix tree implementation same as linux radix-tree.c, which takes a integer and use every 6 bits to index the next position in tree. I can easily think of cases where radix tree's space usage is far more than binary search trees. Please correct me if I'm wrong:
Use cases: (0,1,1,1,1), (1,1,1,1,1), (2,1,1,1,1), ... (63,1,1,1,1).
Here just for convenience, I use (a,b,c,d,e) to represent a 30-bit integer key, with each element standing for a 6-bit value. a is MSB and e is LSB.
Radix Tree:
For this use case, radix tree will have a height of 5, and each key will take 4 separate nodes, because they are on different subtrees of root. So there will be ((5-1) * 64 + 1) = 257 nodes.
Each node contains 2^6 = 64 pointers, so it is going to use 257 * 64 * 4Byte = 65KB
Binary Search Tree
We only care how many keys are there. In this case it has 64 keys.
Assume each BST node uses 3 Pointers per Node, it is going to use 64 * 3 * 4Byte = 768 Bytes.
Comparison
Looks radix tree is very space inefficient. It uses ~100 times space than binary search tree given same number of nodes! I don't understand why it is used even in linux kernel.
Am I missing something? Thanks.
You asked for space complexity, so let's work it out.
If we consider a non-null pointer at the leaf to be a value of interest, then it is not hard to prove by contradiction that the worst case is a fully populated tree with one value per leaf node.
If branching is N-way (in your use case 64) and height is H (in your use case 5), there are N^(H-1) leaf nodes in this tree, storing an equal number of values. The total number of nodes is
So the storage requirement measured in pointers is N times this amount.
This yields a storage efficiency of
This is the total number of pointers divided by the count of valid data pointers.
As N gets bigger, this approaches N. In your example use case, it's actually 65.01 (for N=64). So we can say the storage complexity is O(NV) where V is the number of data values to be stored.
Though we got here with first-principles analysis, it makes total sense. Storage for the leaf level of the complete tree dominates the rest by a factor of nearly N. The size of that storage is NV.
Of course the advantage of trees with enormous branching factors like this (and e.g. B-trees in databases) is that fewer node traversals are needed to get to the right leaf.
Moreover, when each traversal is a single array lookup as in the radix tree, you can't get much faster.
In your use case, a perfectly balanced binary search tree would require up to 30 comparisons with attendant branches flushing the pipeline. This compared to 5 array indexing operations could be much slower. Array indexing tends to be faster than comparison because it's non-branching code. But even if they are the same, the binary tree would only need 2^5=32 elements to cause the same amount of indexing work as a radix tree containing 2^30 elements.
To generalize this, a binary tree of 2^H elements will require the same lookup effort as an index tree capable of holding N^(H-1) elements if key comparsions and array index operations have the same cost.
As others have said, if the index bits for the top levels of the tree tend to a few common prefixes (i.e. they are the top bits of addresses of the same VM space), the worst case storage behavior of the radix tree does not occur.