Performing a location proximity search on a database using S2 Geometry Library

334 views Asked by At

I am working on a project that requires fast performing proximity queries on a database with location data.

In my database I want to store locations with additional information. Idea is that user opens a map on a certain location and my program only fetches the markers visible to the user. If I plan on having millions of values, fetching markers from NYC when I'm zoomed in on London would make the map activity work extremely slow and the data I send back from the db would be HUGE.

That's why when the user opens the map I want to fetch all the markers that are for example in 10km distance from the center of the map. (I'm okay with fetching markers outside of the visible area. I just don't want to fetch markers that are 100km away)

After a thorough research I chose the S2 Geometry Library approach with Hilbert's space filling curve.

The idea of mapping a 2D value to one integer value, where the longer a shared prefix between two indexes is, the spatially closer they are together, was a big selling point.

I need my database to be able to perform this SELECT query lightning fast and I expect to have A LOT of data in the future so operating on only one column is a big plus.

Also the thing that intrigued me the most was the ability to perform fast proximity searches because of the fact that two numbers that are close to each other on the map will have 1D indexes also close to each other.

Hilbert's Curve

The idea looks very simple (If I don't miss anything).

The thing I'm having problems with is how to (If it's even possible) pick the min value and max value on the 1D plane to be sure I'm scanning the whole visible area.

Most of the answers and tutorials I find on the internet propose a solution where you take a bounding area full of smaller S2 index "boxes" and then scan every index in the database to see if it's contained in one of the "boxes" from the array. This is easy to do but when you have 50 milion records it's not possible to go through every single one of them to see if it's in on of the "boxes".

What I have in mind is a solution where you take the minimum value of the area and the maximum value of the area you're searching in and you perform something in the lines of SELECT (...) WHERE s2cellid BETWEEN min AND max

For example I'm in a location 47194c and want to fetch all markers in 10km distance so I take a value that's x to the left of the indeks and a value that's x to the right of the index and perform a BETWEEN 47194c-x AND 47194c+x query

Is something like that possible with the S2 library? If no then what approach should I take to make my queries as quick as possible?

Thanks in advance :)

[I plan on using PostgreSQL]

0

There are 0 answers