how to match a related data if incorrectly texted a keyword in elastic search

83 views Asked by At

I have a document contain title with "Hard work & Success". I need to do a search for this document. And if I typed "Hardwork" (without spacing) it didn't returning any value. but if I typed "hard work" then it is returning the document.

this is the query I have used :

const search = qObject.search;
const payload = {
  from: skip,
  size: limit,
  _source: [
    "id",
    "title",
    "thumbnailUrl",
    "youtubeUrl",
    "speaker",
    "standards",
    "topics",
    "schoolDetails",
    "uploadTime",
    "schoolName",
    "description",
    "studentDetails",
    "studentId"
  ],
  query: {
    bool: {
      must: {
        multi_match: {
          fields: [
            "title^2",
            "standards.standard^2",
            "speaker^2",
            "schoolDetails.schoolName^2",
            "hashtags^2",
            "topics.topic^2",
            "studentDetails.studentName^2",
          ],
          query: search,
          fuzziness: "AUTO",
        },
      },
    },
  },
};

if I searched for title "hard work" (included space) then it returns data like this:

"searchResults": [
        {
            "_id": "92",
            "_score": 19.04531,
            "_source": {
                "standards": {
                    "standard": "3",
                    "categoryType": "STANDARD",
                    "categoryId": "S3"
                },
                "schoolDetails": {
                    "categoryType": "SCHOOL",
                    "schoolId": "TPS123",
                    "schoolType": "PUBLIC",
                    "logo": "91748922mn8bo9krcx71.png",
                    "schoolName": "Carmel CMI Public School"
                },
                "studentDetails": {
                    "studentId": 270,
                    "studentDp": "164646972124244.jpg",
                    "studentName": "Nelvin",
                    "about": "good student"
                },
                "topics": {
                    "categoryType": "TOPIC",
                    "topic": "Motivation",
                    "categoryId": "MY"
                },
                "youtubeUrl": "https://www.youtube.com/watch?v=wermQ",
                "speaker": "Anna Maria Siby",
                "description": "How hardwork leads to success - motivational talk by Anna",
                "id": 92,
                "uploadTime": "2022-03-17T10:59:59.400Z",
                "title": "Hard work & Success",
            }
        },
]

And if i search for the Keyword "Hardwork" (without spacing) it won't detecting this data. I need to make a space in it or I need to match related datas with the searching keyword. Is there any solution for this can you please help me out of this.

1

There are 1 answers

2
rabbitbr On BEST ANSWER

I made an example using a shingle analyzer.

Mapping:

    {
  "settings": {
    "analysis": {
      "filter": {
        "shingle_filter": {
          "type": "shingle",
          "max_shingle_size": 4,
          "min_shingle_size": 2,
          "output_unigrams": "true",
          "token_separator": ""
        }
      },
      "analyzer": {
        "shingle_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "shingle_filter"
          ]
        }
      }
    }
  },
  "mappings": {
        "properties": {
      "title": {
        "type": "text",
        "analyzer": "shingle_analyzer"
      }
    }
  }
}

Now I tested it with your term. Note that the token "hardwork" was generated but the others were also generated which may be a problem for you.

GET idx-separator-words/_analyze
{
  "analyzer": "shingle_analyzer",
  "text": ["Hard work & Success"]
}

Results:

{
  "tokens" : [
    {
      "token" : "hard",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "hardwork",
      "start_offset" : 0,
      "end_offset" : 9,
      "type" : "shingle",
      "position" : 0,
      "positionLength" : 2
    },
    {
      "token" : "hardworksuccess",
      "start_offset" : 0,
      "end_offset" : 19,
      "type" : "shingle",
      "position" : 0,
      "positionLength" : 3
    },
    {
      "token" : "work",
      "start_offset" : 5,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "worksuccess",
      "start_offset" : 5,
      "end_offset" : 19,
      "type" : "shingle",
      "position" : 1,
      "positionLength" : 2
    },
    {
      "token" : "success",
      "start_offset" : 12,
      "end_offset" : 19,
      "type" : "<ALPHANUM>",
      "position" : 2
    }
  ]
}