Return Elasticsearch distance for array of geo points

978 views Asked by At

I need to return the distance for multiple geo points per document in an Elasticsearch array. As of now, my results only return one distance calculated for the array.

I started with the code from the following StackOverflow question: Return distance in elasticsearch results?

My elasticsearch query body contains this:

{
  "stored_fields" : [ "_source" ],
    "script_fields" : {
      "distance" : {
        "script" : {
          "inline": "doc['locations.facility.address.coordinates'].arcDistance(params.lat,params.lon) * 0.001",
          "lang": "painless",
          "params": {
            "lat": 2.27,
            "lon": 50.3
          }
        }
      }
    }
  }

And, my Elasticsearch source documents, when returned, resemble this. (Note that locations is an array.)

"locations": [
    {
      "facility": {
        "address": {
          "country_code": "US",
          "city": "San Diego",
          "coordinates": {
            "lon": -117.165,
            "lat": 32.8408
          },
          "country_name": "United States",
          "state_province": "California",
          "postal_code": "92123"
        }
      }
    },
    {
      "facility": {
        "address": {
          "country_code": "US",
          "city": "Tampa",
          "coordinates": {
            "lon": -82.505,
            "lat": 28.0831
          },
          "country_name": "United States",
          "state_province": "Florida",
          "postal_code": "33613"
        }
      }
    }

]

Currently, my results return something similar to this:

    "fields": {
      "distance": [
        13952.518249603361
      ]
    }

But in the distance array, I need to return a value for each entry in 'locations'.

1

There are 1 answers

0
Joe - Check out my books On BEST ANSWER

This one's tricky.

According to the documentation and the source code, the arcDistance method is only available on the doc values, not on the individual geo point instances underlying those doc values.

In other words, although we could iterate on doc['locations.facility.address.coordinates'], the iteratees don't implement any geo distance methods.

That's a bummer. So we'll have to implement our own geo distance function, perhaps using the haversine formula:

{
  "stored_fields": [
    "_source"
  ],
  "script_fields": {
    "distance": {
      "script": {
        "inline": """
          float distFrom(float lat1, float lng1, float lat2, float lng2) {
            double earthRadius = 6371000; // meters
            double dLat = Math.toRadians(lat2-lat1);
            double dLng = Math.toRadians(lng2-lng1);
            double a = Math.sin(dLat/2) * Math.sin(dLat/2) +
                       Math.cos(Math.toRadians(lat1)) * Math.cos(Math.toRadians(lat2)) *
                       Math.sin(dLng/2) * Math.sin(dLng/2);
            double c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a));
            float dist = (float) (earthRadius * c);
            
            return dist;
          }
        
          return params._source.locations.stream().map(location -> {
              def lat = (float) location.facility.address.coordinates.lat;
              def lon = (float) location.facility.address.coordinates.lon;
              return distFrom(lat, lon, (float) params.lat, (float) params.lon) * 0.001;
          }).collect(Collectors.toList())
        """,
        "lang": "painless",
        "params": {
          "lat": 2.27,
          "lon": 50.3
        }
      }
    }
  }
}

yielding

"hits" : {
  ...
  "hits" : [
    {
      ...
      "_source" : {
        "locations" : [
          { ... },
          { ... }
        ]
      },
      "fields" : {
        "distance" : [
          15894.470000000001,
          13952.498
        ]
      }
    }
  ]
}

To be honest, when there's so much scripting effort required, something's gone wrong.

Generally speaking, scripts should be avoided.

But more importantly, when you're not sorting by these geo distances, this whole computational effort should be done outside of Elasticsearch -- and rather there where you're post-processing the search results. I use Turf for javascript geo calculations, for instance.

Finally, when you store multiple locations/facilities in one array, I'd suggest using nested fields. They prevent array flattening, plus support sorting that makes sense.