Distance-based JOIN given Latitude/Longitude

3.8k views Asked by At

Given the following tables:

table A (id, latitude, longitude)
table B (id, latitude, longitude)

how do I build an efficient T-SQL query that associates each row in A with the closest row in B?

The ResultSet should contains all the rows in A and associate them with 1 and only 1 element in B. The format that I'm looking for is the following:

(A.id, B.id, distanceAB)

I have a function that calculates the distance given 2 pairs of latitude and longitude. I tried something using order by ... limit 1 and/or rank() over (partition by ...) as rowCount ... where rowCount = 1 but the result is either not really what I need or it takes too long to return.

Am I missing something?

3

There are 3 answers

0
tpolyak On

It's possible with the join of two subqueries. The first contains all distances between A and B locations, the second contains only the minimum distance of B locations from A locations.

SELECT x.aid, x.bid, x.distance
FROM
(SELECT A.ID AS aid, 
        B.ID AS bid, 
        SQRT(A.Latitude * A.Latitude + B.Longitude * B.Longitude) AS Distance
     FROM LocationsA AS A 
     CROSS JOIN LocationsB AS B) x JOIN
(SELECT A.ID AS aid, 
        MIN(SQRT(A.Latitude * A.Latitude + B.Longitude * B.Longitude)) AS Distance
     FROM LocationsA AS A 
     CROSS JOIN LocationsB AS B
     GROUP BY A.ID) y ON x.aid = y.aid AND x.Distance = y.Distance
0
Chad On

There's no way to get around the fact that you're going to have to compare every record in A with every record in B, which is obviously going to scale poorly if both A and B contain a lot of records.

That being said, this will return correct results:

SELECT aid, bid, distanceAB
FROM (
  SELECT aid, bid, distanceAB,
    dense_rank() over (partition by aid order by distanceAB) as n
  FROM (
    SELECT a.id as aid, B.id as bid,
      acos(sin(radians(A.lat)) * sin(radians(B.lat)) +
        cos(radians(A.lat)) * cos(radians(B.lat)) *
        cos(radians(A.lon - B.lon))) * 6372.8 as distanceAB
    FROM A cross join B
  ) C
) D
WHERE n = 1

This will return in a reasonable amount of time if your sets aren't too large. With 3 locations in A and 130,000 or so in B, it takes about one second on my machine. 1,000 records in each takes about 40s. Like I said, it scales poorly.

It should be noted that Sparky's answer can return incorrect results under certain circumstances. Suppose your A location is at +40,+100. +40,+111 would not be returned, even though it's closer than +49,+109.

0
Sparky On

This is one approach that should have deceent performance, but a big caveat is that it might not find any results

    select top 1 a.id,b.id,dbo.yourFunction() as DistanceAB
    from a 
    join b on b.latitude between a.latitude-10 and a.latitude+10 and
              b.longititude between a.longitude-10 and b.longittude+10
    order by 3

What you are basically doing is looking for any B row within roughly a 20 unit radius of A and then sorting it by your function to determine the closest. You can adjust the unit radius as needed. While it is not exact, it should reduce the size of the result set and should give you decent performance results.