Background
I recently discovered Google's open source S2 library for manipulating geometric shapes.
https://github.com/google/s2geometry
I'm developing an app that needs to locate the K nearest points to a target point. Currently, I'm utilizing PostgreSQL with geospatial indexing on the latitude/longitude columns. I'm exploring alternative options and S2 has caught my attention.
Questions
I have limited knowledge about the library and I have some questions about it. I would be grateful for any information on its practicality for use.
Question 1) Does anyone know if it is possible to find K closest points using the S2 library?
Question 2) Does anyone know how fast the query would be in S2 vs Geospatial indexes?
I understand that a complete answer is challenging and depends on many variables. I am simply seeking a rough guideline and the perspective of someone more experienced as a starting point.
Google's S2 library is a form of geohashing. It can be used to optimize your geo lookups significantly since it's just a hash/id lookup.
One method of indexing could be:
Index all your points that you care about on a fairly large S2 cell level. You should evaluate your points and see what level works for you based on this chart.
On retrieval, convert your search point to an S2 cell at that level, and then pull all candidate points based on that.
(Optional depending on the accuracy you care about) Calculate distance between candidate points and search point and sort
There are some trade-offs with this performance gain:
Indexing S2-cells on your points means slightly more storage (64-bit integers per id)
You may miss points outside of the S2 cell that you queried by. You could index on multiple levels of S2 to ensure you retrieve enough points. Depending on the density of your points, this might not be an issue.
Retrieving by S2 cell IDs won't actually give you the distance between points - you'll have to calculate that yourself
Here's a code example from the Node S2 library:
Here's a map visualization of the S2 tokens that were created.
Long story short is, yes, it is a form of hashing so you get faster performance with the trade-off of storage, but there are some aspects of accuracy you may have to tune depending on your requirements.