I have a dataset with documents that are identifiable by three fields, let's say "name","timestamp" and "country". Now, I use elasticsearch-dsl-py, but I can read native elasticsearch queries, so I can accept those as answers as well.
Here's my code to get a single document by the three fields:
def get(name, timestamp, country):
search = Item.search()
search = search.filter("term", name=name)
search = search.filter("term", timestamp=timestamp)
search = search.filter("term", country=country)
search = search[:1]
return search.execute()[0]
This is all good, but sometimes I'll need to get 200+ items and calling this function means 200 queries to ES.
What I'm looking for is a single query that will take a list of the three field-identifiers and return all the documents matching it, no matter the order.
I've tried using ORs + ANDs but unfortunately the performance is still poor, although at least I'm not making 200 round trips to the server.
def get_batch(list_of_identifiers):
search = Item.search()
batch_query = None
for ref in list_of_identifiers:
sub_query = Q("match", name=ref["name"])
sub_query &= Q("match", timestamp=ref["timestamp"])
sub_query &= Q("match", country=ref["country"])
if not batch_query:
batch_query = sub_query
else:
batch_query |= sub_query
search = search.filter(batch_query)
return search.scan()
Is there a faster/better approach to this problem?
Is using a multi-search going to be the faster option than using should/musts (OR/ANDs) in a single query?
EDIT: I tried multi-search and there was virtually no difference in the time. We're talking about seconds here. For 6 items it takes 60ms to get the result, for 200 items we're talking about 4-5 seconds.