How to write ElasticSearch Query on the basis of distinct objects?

109 views Asked by At

Here I am trying to get the distinct attribute name on the basis of tenant_id and hierarchy_name , this is my data which is Indexed

       {
      "hits": [
        {
          "_index": "emp_indexs_datas_d_v",
          "_type": "bulkindexing",
          "_id": "84",
          "_source": {
            "id": "2",
            "name": "PRODUCT",
            "values": "GEO"
          }
        },
        {
          "_index": "emp_indexs_datas_d_v",
          "_type": "bulkindexing",
          "_id": "88",
          "_source": {
            "id": "1",
            "name": "CUSTOMER",
            "values": "CUSTOMER_OPEN_1"
          }
        },
        {
          "_index": "emp_indexs_datas_d_v",
          "_type": "bulkindexing",
          "_id": "98",
          "_source": {
            "id": "2",
            "name": "PRODUCT",
            "values": "CUSTOMER_OPEN_2"
          }
        },
        {
          "_index": "emp_indexs_datas_d_v",
          "_type": "bulkindexing",
          "_id": "100",
          "_source": {
            "id": "1",
            "name": "CUSTOMER",
            "values": "CUSTOMER-ALL"
          }
        },
 {
          "_index": "emp_indexs_datas_d_v",
          "_type": "bulkindexing",
          "_id": "99",
          "_source": {
            "id": "2",
            "name": "CUSTOMER",
            "values": "CUSTOMER_OPEN_2"
          }
      ]
    }

This is the query which was trying here , I was getting the distinct attribute_name on the basis of hierarchy_name

{
        "query": {
            "multi_match": {
                "query": "CUSTOMER",
                "fields": [
                    "hierarchy_name"
                ]
            }
        },
        "collapse": {
            "field": "attribute_name.keyword"
        }
    }

Now I want to match one more property tenant_id , previously I was matching with hierarchy_name ,can someone help me with the query

output expected . like suppose for tenant_id 2 and hierarchy_name PRODUCT we get

{
  "hits": [
    {
      "_index": "emp_indexs_datas_d_v",
      "_type": "bulkindexing",
      "_id": "84",
      "_source": {
        "tenant_id": "2",
        "hierarchy_name": "CUSTOMER",
        "attribute_name": "GEO"
      }
    },
    {
      "_index": "emp_indexs_datas_d_v",
      "_type": "bulkindexing",
      "_id": "98",
      "_source": {
        "tenant_id": "2",
        "hierarchy_name": "CUSTOMER",
        "attribute_name": "CUSTOMER_OPEN_2"
      }

    }
  ]
}
2

There are 2 answers

1
ESCoder On BEST ANSWER

You can use a combination of bool/must clause to combine multiple conditions

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "tenant_id": 2
          }
        },
        {
          "multi_match": {
            "query": "PRODUCT",
            "fields": [
              "hierarchy_name"
            ]
          }
        }
      ]
    }
  },
  "collapse": {
    "field": "attribute_name.keyword"
  }
}

Search Result will be

"hits": [
      {
        "_index": "67379727",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.4144652,
        "_source": {
          "tenant_id": "2",
          "hierarchy_name": "PRODUCT",
          "attribute_name": "GEO"
        },
        "fields": {
          "attribute_name.keyword": [
            "GEO"
          ]
        }
      },
      {
        "_index": "67379727",
        "_type": "_doc",
        "_id": "3",
        "_score": 1.4144652,
        "_source": {
          "tenant_id": "2",
          "hierarchy_name": "PRODUCT",
          "attribute_name": "CUSTOMER_OPEN_2"
        },
        "fields": {
          "attribute_name.keyword": [
            "CUSTOMER_OPEN_2"
          ]
        }
      }
    ]
0
tomr On

Here's another approach, which differs from the accepted answer in three ways:

  • analyzed match query is replaced by non-analyzed term filter. Using analyzed filters can produce unexpected/surprising results (see the match docs for explanation)
  • multi-match query is replaced by a term query. Using multi-match for a single field is a bit redundant and hard to read, plus it's another analyzed query
  • collapse is replaced with a terms aggregation. This is just the way I've always done it

Using a terms agg to get all the values of attribute_name.keyword means that we're limited to a certain number of results per shard. This can be gotten around by using a composite aggregation. I don't know whether the same concern applies to use of collapse, but if you have a large number of distinct values then it's probably wise to check.

The query using term queries and a terms agg:

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "tenant_id": 2
          }
        },
        {
          "term": {
            "hierarchy_name": "PRODUCT"
          }
        }
      ]
    }
  },
  "aggs": {
    "distinct_attribute_names": {
      "field": "attribute_name.keyword",
      "size": 1000
  },
  "size": 0
}