Cluster on a spatial index

555 views Asked by At

I am trying to cluster on spatial locality (Not just create a spatial index), but SQL Server does not allow this. To create a spatial index it first wants me to create a clustered primary key, which nothing makes sense to cluster on. I want to create a spatial index and then cluster on spatial location in some way.

I have an idea to create bins that bin each geometry into a certain bin which then gets some integer. Then set that as the required clustered primary key, that way at least some of my data is clustered close together spatially.

I am kind of baffled SQL server doesnt do this already, so either I am missing out on how to do this or most likely someone has thought of this and someone can proposed a good enough solution.

I want to cluster on spatial location because I am dealing with big data and the first filter I do is by spatial location (creating tiles of maps), without clustering on spatial location my pages are now scattered based on some meaningless auto increment integer.

If a simple implementation of binning by spatial location hasn't been proposed, I figured I could just cut the bounds of my geometry into equal squares and then for each center point run a distance formula that includes all geometries that intersect that bin.

This is not specific to SQL server per say, I am looking for general approaches to solving this index/clustering on spatial location. I assume non-mssql databases may come with this functionality built in.

1

There are 1 answers

6
Ben Thul On

I don't see how this would be possible, regardless of implementation. Specifically, the idea of a clustering key is so that you (the db engine) can tell the order in which rows should be stored. This is possible with every other datatype (and combination thereof) because ultimately you can say whether a given tuple is bigger, smaller, or equal to another. What metric would you use for generalized spatial data to say that one instance is bigger or smaller than another? Size? Proximity to the origin? Some other measure? There isn't a well-defined sense of that in the general case, and so you can't do it.

But all is not lost. Just assign an arbitrary identifier to your rows (i.e. an identity column or a column populated by a sequence) and cluster on that. Then you can put a spatial index on that and go to town. Looking at your problem, if your bins are pre-defined, you can put those in another table and do a join using STIntersects. But that may be putting the cart before the horse.