non-binary tree search and insertion

3k views Asked by At

I searched a bit but haven't found the answer to this question..

I built a non-binary tree, so each node can have any number of children (called n-ary tree i think)

To help with searching, I gave every node a number when i built the tree, so that every node's child nodes will be bigger, and all the node to the right of the it will be bigger as well.

something like this:

tree stucture

This way I get logn time for search

The problem comes when I want to insert nodes. This model would not work if I want to insert nodes anywhere but the end.

I thought of a few ways to do it,

  1. insert the new nodes at the desired location, then update the number of all the nodes "behind" it.

  2. instead of using a single number to represent each node, use an array of numbers. The numbers in the array will represent its location on that specific level. For example, node 1 will be {0}. Node 9 will be {0,2}. Node 7 will be {0, 0, 1, 2}. Now when inserting, I just need to change the numbers on that level.

  3. forget all the numbering and just compare every node until the correct one is found. Insertion don't need to care about numbers either.

My question is, which way would be better? I'm not sure if using an array of integers to represent each node is very fast.. maybe it is still faster than the first way? Are there other ways of going about this?

Thank you in advance.

1

There are 1 answers

1
rici On

I gather that the problem you have is to assign a unique identifier to each node in such a way that you can find the node given its unique id in sublinear time.

This is not usually a problem for transient (in-memory) data structures, because typical tree implementations never move a node in memory (until it is deleted). That lets you use the address of the node as a unique identifier, which provides O(1) access. Languages other than C dress this up in an object like a tree iterator or node reference, but under the hood the principle is the same.

However, there are times when you really need to be able to attach a fixed-for-all-time identifier to a tree node, in a way which will be resilient against, for example, persisting the tree to permanent storage and then reserializing it in a different executable image.

One well-known hack is to use floating-point ids. When a new node is inserted, its id is assigned to be the average of its immediate neighbours. For the purpose of this computation, we pretend that there is a node on the left of the tree with id 0.0 and a node to the right with id 1.0, so every node has two neighbours, even if it is the new left- or right-most node. In particular, the root node is given the id 0.5, which is the average of the 0.0 and 1.0 imaginary boundary nodes.

Unfortunately, floating point precision is not infinite, and this hack works best if insertions are always at random places in the tree. If you insert all the nodes at the end, you will rapidly exhaust the floating point precision. You could periodically renumber all the nodes, but that defeats the purpose of having a permanent unchanging unique node id. (For some problem domains, it's acceptable, though.)

Of course, you don't really have to use floating point. A double on standard architecture has 53 bits of precision, which is plenty if your insertions are stochastic and very little if you always insert at the same place; you can use all 64 bits of an unsigned 64-bit integer by (conceptually) locating a fixed binary point prior to the high-order bit. The average computation works the same, except that you need to special case the computation with the 1.0 value.

This is essentially the same as your idea of labelling the nodes with a vector of indices. That scheme has the advantage of never running out of precision, and the disadvantage that the vectors can get quite long. You could also use a hybrid solution where you start a new level only when you run out of precision with the current level.