Clustering or Filtering points in WGS84 Coordinates

154 views Asked by At

So I'm trying to solve a problem. I have a point which can be a player, and I have several objects around, some are farther some are near er. I want to exclude all points that are farther and include the nearer using distances for example. How would one cluster or filter the objects. I'm thinking about spatial partitioning. The objects are in geographic coordinates. The number of objects can be 10.000

1

There are 1 answers

0
Felix Lauer On BEST ANSWER

If every single point is allowed to move, updates might get expensive for kd-trees or similar adaptive structures. I guess I would go for a static partitioning approach, e.g. divide the space into a set of cells (quadratic or rectangular) and for each cell store references to the contained points alongside with maximum and minimum coordinates of the set of contained points. When points are moving, you can trivially compute the current cell they are in. When it comes to distance calculation, you just determine relevant cells and then compute the distances to their contained points with linear time.

I see three basic advantages with this approach:

  1. By looking at the current contained min and max coordinates for each cell you can easily determine whether or not its empty and, if not, the whole set of contained points is relevant for your player's current position.

  2. You can organize the static cells in a tree structure (e.g. a Quadtree) with perfect balancing. For each inner node of the tree you store and update the combined min and max coordinates of their child nodes. Note that updates are quite inexpensive because the tree's structure is not affected at all.

  3. You don't need to sort your points (as it would be necessary for other structures or specific implementations). This could save you a lot of performance if objects are moving rapidly.

  4. Building and maintaining the data structure is simple. You don't have to wreck your brain with exotic test cases and complicated structure updates.

There are, of course, some drawbacks in choosing a non-adaptive data structure because it's, well, non-adaptive. For example, you highly depend on the grid cells' size. If you choose it too small (worst case: one point per cell), the tree's depth bloats up and traversing gets expensive. On the other hand, if you choose it too large (worst case: at some point, all points are in the same cell), you will perform many unneeded and potentially expensive distance calculations.

All in all, it really depends on the kind of data you have. The proposal I gave you should give reasonably good results, but there probably are more efficient ways to do it. If you have enough time, implement both, an adaptive and a static partitioning approach, come up with some representative tests and compare them to each other.

Hope this helps ;)