MongoDB Atlas Search - Multiple terms in search-string with 'and' condition (not 'or')

Question

MongoDB Atlas Search - Multiple terms in search-string with 'and' condition (not 'or')

2.9k views Asked by Lowie At 19 October 2020 at 13:50

In the documentation of MongoDB Atlas search, it says the following for the autocomplete operator:

query: String or strings to search for. If there are multiple terms in a string, Atlas Search also looks for a match for each term in the string separately.

For the text operator, the same thing applies:

query: The string or strings to search for. If there are multiple terms in a string, Atlas Search also looks for a match for each term in the string separately.

Matching each term separately seems odd behaviour to me. We need multiple searches in our app, and for each we expect less results the more words you type, not more.

Example: When searching for "John Doe", I expect only results with both "John" and "Doe". Currently, I get results that match either "John" or "Doe".

Is this not possible using MongoDB Atlas Search, or am I doing something wrong?

Update Currently, I have solved it by splitting the search-term on space (' ') and adding each individual keyword to a separate must-sub-clause (with the compound operator). However, then the search query no longer returns any results if there is one keyword with only one character. To account for that, I split keywords with one character from those with multiple characters. The snippet below works, but for this I need to save two generated fields on each document:

searchString: a string with all the searchable fields concatenated. F.e. "John Doe Man Streetstreet Citycity"
searchArray: the above string uppercased & split on space (' ') into an array

const must = [];
const searchTerms = 'John D'.split(' ');
for (let i = 0; i < searchTerms.length; i += 1) {
    if (searchTerms[i].length === 1) {
      must.push({
        regex: {
          path: 'searchArray',
          query: `${searchTerms[i].toUpperCase()}.*`,
        },
      });
    } else if (searchTerms[i].length > 1) {
      must.push({
        autocomplete: {
          query: searchTerms[i],
          path: 'searchString',
          fuzzy: {
            maxEdits: 1,
            prefixLength: 4,
            maxExpansions: 20,
           },
         },
       });
    }
}
db.getCollection('someCollection').aggregate([
  {
    $search: {
      compound: { must },
    },
  },
]).toArray();

Update 2 - Full example of unexpected behaviour

Create collection with following documents:

db.getCollection('testing').insertMany([{
    "searchString": "John Doe ExtraTextHere"
    }, {
    "searchString": "Jane Doe OtherName"
    }, {
    "searchString": "Doem Sarah Thisistestdata"
    }])

Create search index 'default' on this collection:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "searchString": {
        "type": "autocomplete"
      }
    }
  }
}

Do the following query:

db.getCollection('testing').aggregate([
  {
    $search: {
      autocomplete: {
        query: "John Doe",
        path: 'searchString',
        fuzzy: {
          maxEdits: 1,
          prefixLength: 4,
          maxExpansions: 20,
        },
      },
    },
  },
]).toArray();

When a user searches for "John Doe", this query returns all the documents that have either "John" OR "Doe" in the path "searchString". In this example, that means all 3 documents. The more words the user types, the more results are returned. This is not expected behaviour. I would expect more words to match less results because the search term gets more precise.

Original Q&A

There are 1 answers

**Nice-Guy** · Answer 1 · 2020-11-05T02:40:51+00:00

An edgeGram tokenization strategy might be better for your use case because it works left-to-right.

Try this index definition take from the docs:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "searchString": [
        {
          "type": "autocomplete",
          "tokenization": "edgeGram",
          "minGrams": 3,
          "maxGrams": 10,
          "foldDiacritics": true
        }
      ]
    }
  }
}

Also, add change your query clause from must to filter. That will exclude the documents that do not contain all the tokens.

TechQA.

MongoDB Atlas Search - Multiple terms in search-string with 'and' condition (not 'or')

There are 1 answers

Related Questions in MONGODB

Related Questions in SEARCH

Related Questions in AUTOCOMPLETE

Related Questions in MONGODB-ATLAS

Related Questions in MONGODB-ATLAS-SEARCH

Popular Questions

Popular Tags

Trending Questions