MongoDB fuzzy text search or simple sorting but always with pagination token

24 views Asked by At

My question is: what is the best practice for querying a MongoDB when you need to do either a fuzzy text search and sort by search score, or do an unfiltered query that sorts by a property, but both are required to return a pagination token?

To elaborate, I have a MongoDB with documents that contain a name and description property and I have the following business requirements:

  • When a user inputs text into the search field: perform a fuzzy text search on the description field. Return the first 10 results, sorted by the fuzzy match score. Allow pagination via a "Show More" button.
  • When a user does not input any text into the search field: Return the first 10 results sorted by name. Allow pagination via a "Show More" button.
  • Performance is a major factor.

I have hacked together the following solution. Basically doing the fuzzy search, or doing a .* regex search if there has been no input from the user. This seems like it will be bad for performance once the collection grows, but it was the only way I could think to still utilize a $search in both cases so I could perform pagination without using skip/limit.

pipeline = []

if search:
    pipeline.append({
        '$search': {
            'index': 'autocomplete_index',
            'autocomplete': {
                'path': 'description',
                'query': search,
                'tokenOrder': 'any',
                'fuzzy': {
                    'maxEdits': 2,
                    'prefixLength': 1,
                    'maxExpansions': 256
                }
            },
            "sort": {'unused': {'$meta': "searchScore"}},
        }
    })
else:
    pipeline.append({
        '$search': {
            'index': 'regex_index',
            'regex': {
                'path': 'name',
                'query': '.*',
                'allowAnalyzedField': True
            },
            "sort": { 'name': 1 }
        }
    })

if pagination_token:
    for stage in pipeline:
        if '$search' in stage:
            stage['$search']['searchAfter'] = pagination_token

pipeline.append({
    "$limit": page_size
})

pipeline.append({
    "$project": {
        "_id": 1,
        "name": 1,
        "description": 1,
        "pagination_token": { "$meta" : "searchSequenceToken" }
    }
})

results = self.collection.aggregate(pipeline)

I have had to create the following indexes to support the current solution

autocomplete_index

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "description": {
        "type": "autocomplete"
      }
    }
  }
}

regex_index:

{
  "mappings": {
    "fields": {
      "name": [
        {
          "type": "token"
        },
        {
          "type": "string"
        }
      ]
    }
  }
}
0

There are 0 answers