MySQL query to calculate distance between two coordinates but too slow with HAVING clause

327 views Asked by At

I have the following query that calculates the distance between two coordinates. However, I want to select the users that are within the range of n km. Say, for example, I want to select all users that are within a distance of 100km. Therefore, I have to use HAVING clause here. However, I see that the execution is damn too slow. Even with just two records it's returning the results very slowly. I wonder what would happen with a million user records (in future). Therefore, I am seeking an optimization to my current query for efficient and faster running.

SELECT *, 
  111.1111 *
  DEGREES(ACOS(LEAST(1.0, COS(RADIANS(a.latitude)) 
  * COS(RADIANS(b.latitude)) 
  * COS(RADIANS(a.longitude) - RADIANS(b.longitude)) 
  + SIN(RADIANS(a.latitude))
  * SIN(RADIANS(b.latitude))))) AS distance_in_km 
FROM users AS a
JOIN users AS b ON a.id <> b.id
WHERE b.id != :user AND a.id = :user 
HAVING distance_in_km < :maxdist 
LIMIT 30

UPDATE

As suggested by Rick James, I have removed GROUP BY clause and replaced it with AND a.id = :user in WHERE clause. This, as of now, is returning the same result as was with GROUP BY.

2

There are 2 answers

9
Gordon Linoff On BEST ANSWER

The basic answer is that you cannot make your query more efficient. For your approach, you basically need to calculate the distance between all pairs of users, and that is expensive.

There may be some hacks that you can use. First, you probably don't need reversed pairs, so you can replace a.id <> b.id with a.id < b.id. That will eliminate half the work.

You can use a where clause to pre-filter the rows. For instance, on most of the earth's surface, points that are more than 2 degrees of latitude or longitude apart are more than 100 km apart. This is not true everywhere. But it is probably good enough for you. That allows you to write:

where a.latitude between b.latitude - 2 and b.latitude + 2 and
      a.longitude between b.longitude - 2 and b.longitude + 2

That will save much of the trigonometry, if your users are spread far and wide.

However, the real solution is to use the GIS extensions of MySQL. A place to start learning about that is in the documentation.

EDIT:

SELECT *,
   111.1111 *
   DEGREES(ACOS(LEAST(1.0, COS(RADIANS(a.latitude))
         * COS(RADIANS(b.latitude))
         * COS(RADIANS(a.longitude) - RADIANS(b.longitude))
         + SIN(RADIANS(a.latitude))
         * SIN(RADIANS(b.latitude))))) AS distance_in_km
FROM users a JOIN
     users b
     ON a.id <> b.id
WHERE a.id = :user AND
      a.latitude between b.latitude - 2 and b.latitude + 2 and
      a.longitude between b.longitude - 2 and b.longitude + 2
HAVING distance_in_km < 100
8
James On

The way to do what you want would be repeating the code on your select in the where section

SELECT *, 
   111.1111 *
   DEGREES(ACOS(LEAST(1.0, COS(RADIANS(a.Latitude))
         * COS(RADIANS(b.Latitude))
         * COS(RADIANS(a.Longitude) - RADIANS(b.Longitude))
         + SIN(RADIANS(a.Latitude))
         * SIN(RADIANS(b.Latitude))))) AS distance_in_km
FROM users AS a
  JOIN users AS b ON a.id <> b.id
WHERE
    111.1111 *
    DEGREES(ACOS(LEAST(1.0, COS(RADIANS(a.Latitude))
         * COS(RADIANS(b.Latitude))
         * COS(RADIANS(a.Longitude) - RADIANS(b.Longitude))
         + SIN(RADIANS(a.Latitude))
         * SIN(RADIANS(b.Latitude))))) < 100

Other option will be (better "look and feel", worse performance)

with d as (
    SELECT *, 
       111.1111 *
       DEGREES(ACOS(LEAST(1.0, COS(RADIANS(a.Latitude))
             * COS(RADIANS(b.Latitude))
             * COS(RADIANS(a.Longitude) - RADIANS(b.Longitude))
             + SIN(RADIANS(a.Latitude))
             * SIN(RADIANS(b.Latitude))))) AS distance_in_km
    FROM users AS a
      JOIN users AS b ON a.id <> b.id
)
select * from d
where d.distance_in_km > 100