How do I calculate the minimum geo distance between multiple point combinations faster in python?

302 views Asked by At

I am trying to find the minimum distance between each customer to the store. Currently, there are ~1500 stores and ~670K customers in my data. I have to calculate the geo distance for 670K customers x 1500 stores and find the minimum distance for each customer.

I have created the haversine function below:

import numpy as np
def haversine_np(lon1, lat1, lon2, lat2):

    lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2

    c = 2 * np.arcsin(np.sqrt(a))
    miles = 6367 * c/1.609
    return miles

and my data set looks like below, 1 data frame for the customer (cst_geo) and 1 data frame for the store (store_geo). The numbers below are made up as I can't share the snippet of the real data:

Customer ID Latitude Longitude
A123 39.342 -40.800
B456 38.978 -41.759
C789 36.237 -77.348
Store ID Latitude Longitude
S1 59.342 -60.800
S2 28.978 -71.759
S3 56.237 -87.348

I wrote a for loop below to attempt this calculation but it took >8 hours to run. I have tried to use deco but wasn't able to optimize it any further.

mindist = []
for i in cst_geo.index:
    dist = []
    for j in store_geo.index:
        dist.append(haversine_np(cst_geo.longitude[i], cst_geo.latitude[i],
                                 store_geo.longitude[j], store_geo.latitude[j]))    
    mindist.append(min(dist))
1

There are 1 answers

4
AlexisG On

This can be done with geopy

from geopy.distance import geodesic

customers = [
    (39.342, -40.800),
    (38.978, -41.759),
    (36.237, -77.348),
]
stores = [
    (59.342, -60.800),
    (28.978, -71.759),
    (56.237, -87.348),
]
matrix = [[None] * len(customers)] * len(stores)
for index, i in enumerate(customers):
    for j_index, j in enumerate(stores):
        matrix[j_index][index] = geodesic(i, j).meters

output

[[3861568.3809260903, 3831526.290564832, 2347407.258650098, 2347407.258650098], 
[3861568.3809260903, 3831526.290564832, 2347407.258650098, 2347407.258650098],
 [3861568.3809260903, 3831526.290564832, 2347407.258650098, 2347407.258650098]]

you can also have the distance in others units with kilometers, miles, feet ...