Why document containing all search terms scored lower?

222 views Asked by At

I am having search results with only 1 term to appear above the results matching two terms in the query. Below is my setup

Search query

POST index1/_search

{
    "size": 5,
    "query": {
        "bool": {
            "should": [
                {
                    "match": {
                        "content": {
                            "query": "devtools tutorial"
                        }
                    }
                }
            ]
        }
    }
}

Settings and mapping used:

{
    "mappings": {
        "properties": {
            "content": {
                "type": "text"
            },
            "title": {
                "type": "text"
            }
        }
    }
}

Sample Docs I have been using for testing purposes. I want doc with _id : 3 to appear above doc with _id:4 because it has both the terms in the query:

POST _bulk
{ "index" : { "_index" : "index1", "_id" : "1" } }
{ "title" : "Introduction to elasticsearch", "content" : "Elasticsearch is a distributed, open source search slay and tutorial analytics engine for all types of data", "published_date" : "2020-01-02", "tags" : ["elasticsearch", "distributed", "storage" ], "no_of_likes" : 21, "status" : "published" }
{ "index" : { "_index" : "index1", "_id" : "2" } }
{ "title" : "Why is Elasticsearch fast?", "content" : "It is able to achieve fast search responses because, instead small of tutorial searching the text directly, it searches an index instead", "tags" : ["elasticsearch", "fast", "index" ], "no_of_likes" : 10,"status" : "draft"}
{ "index" : { "_index" : "index1", "_id" : "3" } }
{ "title" : "Introducing the New React DevTools", "content" : "We are excited to announce a new release of accompany the React DevTools tutorial, available today in Chrome, Firefox, and (Chromium) Edge.We are excited to announce a new release of accompany the React tutorial, available today in Chrome, Firefox, and (Chromium) Edge.We are excited to announce a new release of accompany the React DevTools tutorial, available today in Chrome, Firefox, and (Chromium) Edge.We are excited to announce a new release of accompany the React DevTools tutorial, available today in Chrome, Firefox, and (Chromium) Edge.We are excited to announce a new release of accompany the React tutorial, available today in Chrome, Firefox, and (Chromium) Edge.We are excited to announce a new release of accompany the React tutorial, available today in Chrome, Firefox, and (Chromium) Edge.We are excited to announce a new release of accompany the React tutorial, available today in Chrome, Firefox, and (Chromium) EdgeWe are excited to announce a new release of accompany the React  tutorial, available today in Chrome, Firefox, and (Chromium) EdgeWe are excited to announce a new release of accompany the React tutorial, available today in Chrome, Firefox, and (Chromium) EdgeWe are excited to announce a new release of accompany the React DevTools tutorial, available today in Chrome, Firefox, and (Chromium) Edge", "published_date" : "2019-08-25", "tags" : ["react", "devtools" ], "no_of_likes" : 2, "status" : "published"}
{ "index" : { "_index" : "index1", "_id" : "4" } }
{ "title" : "Angular Tools for High Performance", "content" : "devtools", "published_date" : "2014-03-22", "tags" : ["angular", "performance","fast"], "no_of_likes" : 35, "status" : "published"}
{ "index" : { "_index" : "index1", "_id" : "5" } }
{ "title" : "The new features in Java 14", "content" : "Oracle on September 17 said switch expressions tutorial are expected naresh to go final in Java Development Kit 14 (JDK 14). ", "published_date" : "2019-07-20", "tags" : ["java"], "no_of_likes" : 11, "status" : "published"}
{ "index" : { "_index" : "index1", "_id" : "6" } }
{ "title" : "Thread behavior in the JVM", "content" : "Threading refers to the practice of executing programming tutorial processes accompani concurrently to improve application performance.", "tags" : ["java","jvm"], "no_of_likes" : 3, "status" : "draft"}
{ "index" : { "_index" : "index1", "_id" : "7" } }
{ "title" : "Stacks and Queues", "content" : "The main operations of a stack are push, pop, & isEmpty and for queue enqueue, dequeue, & isEmpty., ", "published_date" : "2016-12-12", "tags" : ["stack","queue","datastructures"], "no_of_likes" : 43, "status" : "published"}
{ "index" : { "_index" : "index1", "_id" : "8" } }
{ "title" : "How are big data and ai changing the business world?","content" : "Today’s businesses are ruled by data. Specifically, big data and AI that have gradually been murder  evolving to juvenile day-to-day business murder and playing as the key murder driver in business murder Intelligence decision-making","published_date" : "2020-01-01","tags" :["big data","ai"],"no_of_likes" :120,"status" : "published"}
{ "index" : { "_index" : "index1", "_id" : "9" } }
{ "title" : "Hash Tables", "content" : "A hash table is a data structure used to implement symbol table (associative array), a structure tutorial that can map keys to values", "published_date" : "2017-08-12", "tags" :[ "hash", "datastructures" ], "no_of_likes" :13, "status" : "published" }
{ "index" : { "_index" : "index1", "_id" : "10" } }
{ "title" : "Go vs Python: How to choose", "content" : "Python and Go share a reputation for being convenient tutorial to work with. Both languages have a simple and straightforward syntax and a small and easily remembered feature set", "tags" :[ "go", "python" ], "no_of_likes" :134, "status" : "draft" }
{ "index" : { "_index" : "index1", "_id" : "11" } }
{ "title" : "Android Studio 4.0 backs native UI toolkit", "content" : "Now available in a preview juvenile, the weapon Android murder 4.0 ‘Canary’ upgrade works with the JetPack Compose UI toolkit and improves Java 8 support", "tags" :[ "android", "nativeui" ], "no_of_likes" :113, "status" : "draft" }
{ "index" : { "_index" : "index1", "_id" : "12" } }
{ "title" : "JSON tools you don’t want to miss", "content" : "Developers can choose from many great free and juvenile tools for tutorial JSON formatting, validating, editing, and converting to other formats", "published_date" : "2018-02-13", "tags" :[ "json" ], "no_of_likes" :23, "status" : "published" }
{ "index" : { "_index" : "index1", "_id" : "13" } }
{ "title" : "Get started with method references in Java", "content" : "Use method references to simplify functional programming in Java", "tags" :[ "java", "references" ], "no_of_likes" :102, "status" : "draft" }
{ "index" : { "_index" : "index1", "_id" : "14" } }
{ "title" : "How to choose a database for your application", "content" : "From performance to programmability, the right childlike makes all the difference. Here are 12 key questions to help guide your selection", "published_date" : "2009-02-12", "tags" :[ "database" ], "no_of_likes" :229, "status" : "published" }
{ "index" : { "_index" : "index1", "_id" : "15" } }
{ "title" : "10 reasons to Learn Scala Programming Language", "content" : "One of the questions my reader youthful tutorial ask me is, shall I learn Scala? Does Scala has a better future than Java, or why Java developer should learn Scala and so on", "published_date" : "2009-02-12", "tags" :[ "scala", "language" ], "no_of_likes" :136, "status" : "draft" }
{ "index" : { "_index" : "index1", "_id" : "16" } }
{ "title" : "ways to declare and initialize Two-dimensional (2D) String and Integer Array in Java", "content" : "Declaring a two-dimensional array is very interesting in Java as Java programming youthful provides many ways to declare a 2D array and each one of them has some special things to learn about", "published_date" : "2009-02-12", "tags" :[ "jaava", "datastructure", "array" ], "no_of_likes" :342, "status" : "published" }
{ "index" : { "_index" : "index1", "_id" : "17" } }
{ "title" : "Hibernate Tip: How to customize the association mappings using a composite key", "content" : "Hibernate provides lots of mapping features that allow you to map complex domain and table models. But the availability of these features doesn't mean that you should use them in all of your applications", "tags" :[ "hibernate", "compositekey" ], "no_of_likes" :112, "status" : "draft" }
{ "index" : { "_index" : "index1", "_id" : "18" } }
{ "title" : "Getting started with Python on Spark", "content" : "At my current project I work a lot with Apache Spark juvenile running PySpark jobs on it.", "tags" :[ "python", "spark" ], "no_of_likes" :86, "status" : "draft" }
{ "index" : { "_index" : "index1", "_id" : "19" } }
{ "title" : "Relationship between IOT, big data, and cloud computing", "content" : "Big data analytics is the basis of decision making in an organization. It involves the examination of juvenile a large number of data sets in order to identify the hidden patterns that result in their existence.", "published_date" : "2018-11-10", "tags" :[ "iot", "big data", "cloud computing" ], "no_of_likes" :12, "status" : "published" }
{ "index" : { "_index" : "index1", "_id" : "20" } }
{ "title" : "Get started with juvenile expressions in Java", "content" : "Learn how to use lambda juvenile and tutorial functional programming techniques in your Java programs.", "tags" :[ "java", "lambda", "functional programming" ], "no_of_likes" :128, "status" : "draft" }

Please note, doc-3 which has both devtools and tutorial scored lower than doc-4 which has just devtools.

1

There are 1 answers

7
Amit On BEST ANSWER

Spent quite some time on this and found the root cause and solution of it, after analyzing the search output with explain=true param, if you notice, below is the formula to calculate the tf score

"description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",

"details": [   {
"value": 1.0,
"description": "freq, occurrences of term within document",
"details": [
  
]   },   {
"value": 1.2,
"description": "k1, term saturation parameter",
"details": [
  
]   },   {
"value": 0.75,
"description": "b, length normalization parameter",
"details": [
  
]   },   {
"value": 2.0,
"description": "dl, length of field",
"details": [
  
]   },   {
"value": 29.545454,
"description": "avgdl, average length of field",
"details": [
  
]   } ]

If you notice, its made of a total 5 component and dl ie length of field which matched the search result is very less in case of doc-id 4 as it contains just devtools and if you notice, this dl is part of the denominator and small value will increase the tf and final formula is score(freq=4.0), computed as boost * idf * tf if tf is multiplied with other components which are same for all the docs.

And this is happening due to field normalization and in order to fix it, you have to disable the norms on the searchable field and try again, I again defined the index-mapping with norms disabled on content field and got the results you wanted.

Index mapping

{
    "mappings": {
        "properties": {
            "content": {
                "type": "text",
                "norms": false
            },
            "title": {
                "type": "text"
            }
        }
    }
}

And index docs using your bulk request and then use the same search request, which produce the below expected result:

 "hits": [
            {
                "_index": "64180913_1",
                "_type": "_doc",
                "_id": "3",
                "_score": 5.803219,
                "_source": {
                    "title": "Introducing the New React DevTools",
                    "content": "We are excited to announce a new release of accompany the React DevTools tutorial, available today in Chrome, Firefox, and (Chromium) Edge.We are excited to announce a new release of accompany the React tutorial, available today in Chrome, Firefox, and (Chromium) Edge.We are excited to announce a new release of accompany the React DevTools tutorial, available today in Chrome, Firefox, and (Chromium) Edge.We are excited to announce a new release of accompany the React DevTools tutorial, available today in Chrome, Firefox, and (Chromium) Edge.We are excited to announce a new release of accompany the React tutorial, available today in Chrome, Firefox, and (Chromium) Edge.We are excited to announce a new release of accompany the React tutorial, available today in Chrome, Firefox, and (Chromium) Edge.We are excited to announce a new release of accompany the React tutorial, available today in Chrome, Firefox, and (Chromium) EdgeWe are excited to announce a new release of accompany the React  tutorial, available today in Chrome, Firefox, and (Chromium) EdgeWe are excited to announce a new release of accompany the React tutorial, available today in Chrome, Firefox, and (Chromium) EdgeWe are excited to announce a new release of accompany the React DevTools tutorial, available today in Chrome, Firefox, and (Chromium) Edge",
                    "published_date": "2019-08-25",
                    "tags": [
                        "react",
                        "devtools"
                    ],
                    "no_of_likes": 2,
                    "status": "published"
                }
            },
            {
                "_index": "64180913_1",
                "_type": "_doc",
                "_id": "4",
                "_score": 3.5244086,
                "_source": {
                    "title": "Angular Tools for High Performance",
                    "content": "devtools", --> note this its below doc-3
                    "published_date": "2014-03-22",
                    "tags": [
                        "angular",
                        "performance",
                        "fast"
                    ],
                    "no_of_likes": 35,
                    "status": "published"
                }
            },
            {
                "_index": "64180913_1",
                "_type": "_doc",
                "_id": "1",
                "_score": 1.1478703,
                "_source": {
                    "title": "Introduction to elasticsearch",
                    "content": "Elasticsearch is a distributed, open source search slay and tutorial analytics engine for all types of data",
                    "published_date": "2020-01-02",
                    "tags": [
                        "elasticsearch",
                        "distributed",
                        "storage"
                    ],
                    "no_of_likes": 21,
                    "status": "published"
                }
            },
            {
                "_index": "64180913_1",
                "_type": "_doc",
                "_id": "2",
                "_score": 1.1478703,
                "_source": {
                    "title": "Why is Elasticsearch fast?",
                    "content": "It is able to achieve fast search responses because, instead small of tutorial searching the text directly, it searches an index instead",
                    "tags": [
                        "elasticsearch",
                        "fast",
                        "index"
                    ],
                    "no_of_likes": 10,
                    "status": "draft"
                }
            },
            {
                "_index": "64180913_1",
                "_type": "_doc",
                "_id": "5",
                "_score": 1.1478703,
                "_source": {
                    "title": "The new features in Java 14",
                    "content": "Oracle on September 17 said switch expressions tutorial are expected naresh to go final in Java Development Kit 14 (JDK 14). ",
                    "published_date": "2019-07-20",
                    "tags": [
                        "java"
                    ],
                    "no_of_likes": 11,
                    "status": "published"
                }
            }
        ]

P.S: its nothing to do with synonym so I have removed that part to make your question short and simple.