I have built a d dimensional KD-Tree. I want to do range search on this tree. Wikipedia mentions range search in KD-Trees, but doesn't talk about implementation/algorithm in any way. Can someone please help me with this? If not for any arbitrary d, any help for at least for d = 2 and d = 3 would be great. Thanks!
How to implement range search in KD-Tree
2.9k views Asked by Ankit Kumar AtThere are 2 answers
On
This is my solution for a KD-tree, where each node stores points (so not just the leafs). (Note that adapting for where points are stored only in the leafs is really easy).
I leaf some of the optimizations out and will explain them at the end, this to reduce the complexity of the solution.
The get_range function has varargs at the end, and can be called like,
x1, y1, x2, y2 or
x1, y1, z1, x2, y2, z2 etc. Where first the low values of the range are given and then the high values.
(You can use as many dimensions as you like).
static public <T> void get_range(K_D_Tree<T> tree, List<T> result, float... range) {
if (tree.root == null) return;
float[] node_region = new float[tree.DIMENSIONS * 2];
for (int i = 0; i < tree.DIMENSIONS; i++) {
node_region[i] = -Float.MAX_VALUE;
node_region[i+tree.DIMENSIONS] = Float.MAX_VALUE;
}
_get_range(tree, result, tree.root, node_region, 0, range);
}
The node_region represents the region of the node, we start as large as possible. Cause for all we know this could be the region we are dealing with.
Here the recursive _get_range implementation:
static public <T> void _get_range(K_D_Tree<T> tree, List<T> result, K_D_Tree_Node<T> node, float[] node_region, int dimension, float[] target_region) {
if (dimension == tree.DIMENSIONS) dimension = 0;
if (_contains_region(tree, node_region, target_region)) {
_add_whole_branch(node, result);
}
else {
float value = _value(tree, dimension, node);
if (node.left != null) {
float[] node_region_left = new float[tree.DIMENSIONS*2];
System.arraycopy(node_region, 0, node_region_left, 0, node_region.length);
node_region_left[dimension + tree.DIMENSIONS] = value;
if (_intersects_region(tree, node_region_left, target_region)){
_get_range(tree, result, node.left, node_region_left, dimension+1, target_region);
}
}
if (node.right != null) {
float[] node_region_right = new float[tree.DIMENSIONS*2];
System.arraycopy(node_region, 0, node_region_right, 0, node_region.length);
node_region_right[dimension] = value;
if (_intersects_region(tree, node_region_right, target_region)){
_get_range(tree, result, node.right, node_region_right, dimension+1, target_region);
}
}
if (_region_contains_node(tree, target_region, node)) {
result.add(node.point);
}
}
}
One important thing that the other answer does not provide is this part:
if (_contains_region(tree, node_region, target_region)) {
_add_whole_branch(node, result);
}
With a range search for a KD-Tree you have 3 options for a node's region, it's:
- fully outside
- it intersects
- it's fully contained
Once you know a region is fully contained, then you can add the whole branch without doing any dimension checks.
To make it more clear, here is the _add_whole_branch:
static public <T> void _add_whole_branch(K_D_Tree_Node<T> node, List<T> result) {
result.add(node.point);
if (node.left != null) _add_whole_branch(node.left, result);
if (node.right != null) _add_whole_branch(node.right, result);
}
In this image, all the big white dots where added using _add_whole_branch and only for the red dots a check for all dimensions had to be done.

Optimization
1)
Instead of starting with the root node for the _get_range function, instead you can find the split node. This is the first node that has it's point within the query range. To find the split node you will still need to start at the root node, but the calculations are a bit cheaper (cause you go either left or right till).
2)
Now I create the float[] node_region_left and float[] node_region_right, and since this happens in a recursive function it can lead to quite some arrays. However, you can reuse the one for the left for the right. I didn't do it in this example for clarity reasons.
I can also imagine storing the region size in the node, but this takes quite some more memory and might lead to a lot of cache misses.
There are multiple variants of kd-tree. The one I used had the following specs:
maxCapacitypoints.Side note: there are also versions where each node (irrespective of whether its internal or leaf) stores exactly one point. The algorithm below can be tweaked for those too. Its mainly the
buildTreewhere the key difference lies.I wrote an algorithm for this some 2 years back, thanks to the resource pointed to by @9mat .
Suppose the task is to find the number of points which lie in a given hyper-rectangle ("d" dimensions). This task can also be to list all points OR all points which lie in given range and satisfy some other criteria etc, but that can be a straightforward change to my code.
Define a base node class as:
Then, an internal node (non-leaf node) can look like this:
Finally, the leaf node would look like:
In the code for range query inside the leaf node, it is also possible to do a "binary search" inside of "linear search". Since the points will be sorted along on the axis
axis, you can do a binary search do findlandrvalues usingq_minandq_max, and then do a linear search fromltorinstead of0tokeyCount-1(of course in the worst case it wont help, but practically, and especially if you have a capacity of pretty high values, this may help).