My question is: what is the best practice for querying a MongoDB when you need to do either a fuzzy text search and sort by search score, or do an unfiltered query that sorts by a property, but both are required to return a pagination token?
To elaborate, I have a MongoDB with documents that contain a name and description property and I have the following business requirements:
- When a user inputs text into the search field: perform a fuzzy text search on the
descriptionfield. Return the first 10 results, sorted by the fuzzy match score. Allow pagination via a "Show More" button. - When a user does not input any text into the search field: Return the first 10 results sorted by
name. Allow pagination via a "Show More" button. - Performance is a major factor.
I have hacked together the following solution. Basically doing the fuzzy search, or doing a .* regex search if there has been no input from the user. This seems like it will be bad for performance once the collection grows, but it was the only way I could think to still utilize a $search in both cases so I could perform pagination without using skip/limit.
pipeline = []
if search:
pipeline.append({
'$search': {
'index': 'autocomplete_index',
'autocomplete': {
'path': 'description',
'query': search,
'tokenOrder': 'any',
'fuzzy': {
'maxEdits': 2,
'prefixLength': 1,
'maxExpansions': 256
}
},
"sort": {'unused': {'$meta': "searchScore"}},
}
})
else:
pipeline.append({
'$search': {
'index': 'regex_index',
'regex': {
'path': 'name',
'query': '.*',
'allowAnalyzedField': True
},
"sort": { 'name': 1 }
}
})
if pagination_token:
for stage in pipeline:
if '$search' in stage:
stage['$search']['searchAfter'] = pagination_token
pipeline.append({
"$limit": page_size
})
pipeline.append({
"$project": {
"_id": 1,
"name": 1,
"description": 1,
"pagination_token": { "$meta" : "searchSequenceToken" }
}
})
results = self.collection.aggregate(pipeline)
I have had to create the following indexes to support the current solution
autocomplete_index
{
"mappings": {
"dynamic": false,
"fields": {
"description": {
"type": "autocomplete"
}
}
}
}
regex_index:
{
"mappings": {
"fields": {
"name": [
{
"type": "token"
},
{
"type": "string"
}
]
}
}
}