Using a single query for multiple searches in ElasticSearch

229 views Asked by At

I have a dataset with documents that are identifiable by three fields, let's say "name","timestamp" and "country". Now, I use elasticsearch-dsl-py, but I can read native elasticsearch queries, so I can accept those as answers as well.
Here's my code to get a single document by the three fields:

def get(name, timestamp, country):
    search = Item.search()
    search = search.filter("term", name=name)
    search = search.filter("term", timestamp=timestamp)
    search = search.filter("term", country=country)
    search = search[:1]
    return search.execute()[0]

This is all good, but sometimes I'll need to get 200+ items and calling this function means 200 queries to ES.
What I'm looking for is a single query that will take a list of the three field-identifiers and return all the documents matching it, no matter the order.
I've tried using ORs + ANDs but unfortunately the performance is still poor, although at least I'm not making 200 round trips to the server.

def get_batch(list_of_identifiers):
    search = Item.search()
    batch_query = None
    for ref in list_of_identifiers:
        sub_query = Q("match", name=ref["name"])
        sub_query &= Q("match", timestamp=ref["timestamp"])
        sub_query &= Q("match", country=ref["country"])
        if not batch_query:
            batch_query = sub_query
        else:
            batch_query |= sub_query
    search = search.filter(batch_query)
    return search.scan()

Is there a faster/better approach to this problem?
Is using a multi-search going to be the faster option than using should/musts (OR/ANDs) in a single query?

EDIT: I tried multi-search and there was virtually no difference in the time. We're talking about seconds here. For 6 items it takes 60ms to get the result, for 200 items we're talking about 4-5 seconds.

0

There are 0 answers