elastic search match query over array object

6.1k views Asked by At

Suppose i've 3 doc

doc_1 = {
    "citedIn": [
        "Bar Councils Act, 1926 - Section 15",
        "Contract Act, 1872 - Section 23"
    ]
}

doc_2 = {
    "citedIn":[
        "15 C. B 400", 
        "Contract Act, 1872 - Section 55"
    ]
}

doc_3 = {
    "citedIn":[
        "15 C. B 400", 
        "Contract Act, 1872 - Section 15"
    ]
}

Here citedIn field is a array object.Now i want run a stander match query

{
    "query":
    {
        "match": {"citedIn":{"query": "Contract act 15" , "operator":"and" }}
    }

}

The above query return all of the 3 doc, but it suppose to return doc_3 as only doc_3 contain Contract, act and 15 together in a single array element .

How would i achieve this ?

Any suggestion/Solution would be preferable

Nested Data Type Update :

i did try nested field. This Is my mapping

{
    "mappings": {
        "properties": {
            "citedIn": {
                "type": "nested",
                "include_in_parent": true,
                "properties": {
                    "someFiled": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    }
                }
            }
        }
    }
}

This is my data

doc_1 = {
    "citedIn": [
        {"someFiled" : "Bar Councils Act, 1926 - Section 15"},
        {"someFiled" : "Contract Act, 1872 - Section 23"}
    ]
}

doc_2 = {
    "citedIn":[
        {"someFiled" : "15 C. B 400"}
        {"someFiled" : "Contract Act, 1872 - Section 55"}
    ]
}

doc_3 = {
    "citedIn":[
        {"someFiled" : "15 C. B 400"},
        {"someFiled" : "Contract Act, 1872 - Section 15"}
    ]
}

This is my query

{
    "query":
    {

        "match": {"citedIn.someFiled":{"query": "Contract act 15" , "operator":"and" }}
            
        
    }
}

But still getting same result

2

There are 2 answers

6
ESCoder On BEST ANSWER

Adding a working example with index data, mapping,search query, and search result.

You need to use nested query to search on nested fields

Index Mapping

{
    "mappings": {
        "properties": {
            "citedIn": {
                "type": "nested"
            }
        }
    }
}

Index Data:

 {
        "citedIn": [
            {
                "someFiled": "Bar Councils Act, 1926 - Section 15"
            },
            {
                "someFiled": "Contract Act, 1872 - Section 23"
            }
        ]
    }
    {
        "citedIn": [
            {
                "someFiled": "15 C. B 400"
            },
            {
                "someFiled": "Contract Act, 1872 - Section 55"
            }
        ]
    }
    {
        "citedIn": [
            {
                "someFiled": "15 C. B 400"
            },
            {
                "someFiled": "Contract Act, 1872 - Section 15"
            }
        ]
    }

Search Query:

{
    "query": {
        "nested": {
            "path": "citedIn",
            "query": {
                "bool": {
                    "must": [
                        {
                            "match": {
                                "citedIn.someFiled": "contract"
                            }
                        },
                        {
                            "match": {
                                "citedIn.someFiled": "act"
                            }
                        },
                        {
                            "match": {
                                "citedIn.someFiled": 15
                            }
                        }
                    ]
                }
            },
            "inner_hits": {}
        }
    }
}

Search Result:

"inner_hits": {
          "citedIn": {
            "hits": {
              "total": {
                "value": 1,
                "relation": "eq"
              },
              "max_score": 1.620718,
              "hits": [
                {
                  "_index": "stof_64170705",
                  "_type": "_doc",
                  "_id": "3",
                  "_nested": {
                    "field": "citedIn",
                    "offset": 1
                  },
                  "_score": 1.620718,
                  "_source": {
                    "someFiled": "Contract Act, 1872 - Section 15"
                  }
                }
              ]
            }
          }
        }
      }
2
Amit On

There is no way for you to achieve this as what you are indexing is a array to strings in your citedIn field, and as all Elasticsearch fields are multi-valued by default as it was designed in Lucene that way, and elasticsearch is built on top of Lucene search library.

Please read arrays in elasticsearch for more info, especially the last important note as shown in below image:

enter image description here

As explained in above image, all your strings in your array actually part of same field, hence there is no way for ES to identify whether your search string was part of same string in array or not, due to which you were getting all the docs in search.

Unless you index these strings as part of another fields like nested fields, but for that you need to give the name of fields and its like a map where key is your field name and value is field value and than you query on field names, you wouldn't be able to achieve your use-case.