MongoDB Atlas Search - Multiple terms in search-string with 'and' condition (not 'or')

2.9k views Asked by At

In the documentation of MongoDB Atlas search, it says the following for the autocomplete operator:

query: String or strings to search for. If there are multiple terms in a string, Atlas Search also looks for a match for each term in the string separately.

For the text operator, the same thing applies:

query: The string or strings to search for. If there are multiple terms in a string, Atlas Search also looks for a match for each term in the string separately.

Matching each term separately seems odd behaviour to me. We need multiple searches in our app, and for each we expect less results the more words you type, not more.

Example: When searching for "John Doe", I expect only results with both "John" and "Doe". Currently, I get results that match either "John" or "Doe".

Is this not possible using MongoDB Atlas Search, or am I doing something wrong?


Update Currently, I have solved it by splitting the search-term on space (' ') and adding each individual keyword to a separate must-sub-clause (with the compound operator). However, then the search query no longer returns any results if there is one keyword with only one character. To account for that, I split keywords with one character from those with multiple characters. The snippet below works, but for this I need to save two generated fields on each document:

  • searchString: a string with all the searchable fields concatenated. F.e. "John Doe Man Streetstreet Citycity"
  • searchArray: the above string uppercased & split on space (' ') into an array
const must = [];
const searchTerms = 'John D'.split(' ');
for (let i = 0; i < searchTerms.length; i += 1) {
    if (searchTerms[i].length === 1) {
      must.push({
        regex: {
          path: 'searchArray',
          query: `${searchTerms[i].toUpperCase()}.*`,
        },
      });
    } else if (searchTerms[i].length > 1) {
      must.push({
        autocomplete: {
          query: searchTerms[i],
          path: 'searchString',
          fuzzy: {
            maxEdits: 1,
            prefixLength: 4,
            maxExpansions: 20,
           },
         },
       });
    }
}
db.getCollection('someCollection').aggregate([
  {
    $search: {
      compound: { must },
    },
  },
]).toArray();

Update 2 - Full example of unexpected behaviour

Create collection with following documents:

db.getCollection('testing').insertMany([{
    "searchString": "John Doe ExtraTextHere"
    }, {
    "searchString": "Jane Doe OtherName"
    }, {
    "searchString": "Doem Sarah Thisistestdata"
    }])

Create search index 'default' on this collection:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "searchString": {
        "type": "autocomplete"
      }
    }
  }
}

Do the following query:

db.getCollection('testing').aggregate([
  {
    $search: {
      autocomplete: {
        query: "John Doe",
        path: 'searchString',
        fuzzy: {
          maxEdits: 1,
          prefixLength: 4,
          maxExpansions: 20,
        },
      },
    },
  },
]).toArray();

When a user searches for "John Doe", this query returns all the documents that have either "John" OR "Doe" in the path "searchString". In this example, that means all 3 documents. The more words the user types, the more results are returned. This is not expected behaviour. I would expect more words to match less results because the search term gets more precise.

1

There are 1 answers

3
Nice-Guy On

An edgeGram tokenization strategy might be better for your use case because it works left-to-right.

Try this index definition take from the docs:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "searchString": [
        {
          "type": "autocomplete",
          "tokenization": "edgeGram",
          "minGrams": 3,
          "maxGrams": 10,
          "foldDiacritics": true
        }
      ]
    }
  }
}

Also, add change your query clause from must to filter. That will exclude the documents that do not contain all the tokens.