Need an Elasticsearch query that will restrict results to those that have the same value in one field but different values in another field

92 views Asked by At

I'm guessing this is most likely going to be an aggregation because I don't know the value of one particular field - the field that needs to be compared with other results. I'm trying to create a list of results where: fieldA in Result1 has the exact same value as fieldA in Result2 (I don't care what it is) AND where 1 Result1 has "blue" in fieldB and Result2 has "green" in fieldB (I do care that "blue is present in one and "green" is present in the other. Am I correct that this is only achievable through aggregations? If so, what would that aggregation script look like (I'm assuming it's a script too). Thanks.

So, here's some sample data:

[
    {
        "record_id": "1",
        "record_type": "typeA",
        "field_a": "1111111111",
        "field_b": "blue"
    },
    {
        "record_id": "2",
        "record_type": "typeA",
        "field_a": "1111111111",
        "field_b": "green"
    },
    {
        "record_id": "3",
        "record_type": "typeA",
        "field_a": "2222222222",
        "field_b": "blue"
    },
    {
        "record_id": "4",
        "record_type": "typeA",
        "field_a": "2222222222",
        "field_b": "yellow"
    }
]

My query only knows the value of "record_type". What I need is a list of "field_a" values that match where there are at least 2 results where 1 has "blue" in "field_b" and 1 has "green" in "field_b".

So, in this example, I would like to know that "1111111111" matches that criteria - there is 1 with "blue" and 1 with "green", while "2222222222" does NOT match since there is 1 with "blue" and 1 with "yellow" (i.e., not "green").

I know I can do an aggregation that, for instance, counts the values in "field_a" for my results.

However, there can be 1 - N with the same value in "field_a". I specifically need to know if there is at least 1 with "blue" in "field_b" and at least 1 with "green" in "field_b" for a given "field_a" value.

Hopefully that illustrates the problem, but if not here's what I "think" I should be doing - an aggregation of an aggregation:

{
  "size": 0,
   "query": {
      "bool": {
        "must": [
        {
          "query_string": {
            "query": "*",
            "analyze_wildcard": true,
            "default_field": "*"
          }
        }
      ],
    "filter" : [
      {
        "terms" : {
          "record_type.keyword" : [
            "typeA"
          ],
          "boost" : 1.0
        }
      }
    ]
      }
   },
   "aggs": {
    "uniq_field_a_values": {
      "terms": {
        "field": "field_a.keyword",
        "size" : 10000
      }
    },
    "aggs": {
      [ what should this look like???? ]
    }
  }
}
1

There are 1 answers

0
David Steiner On BEST ANSWER

I was able to get what I wanted by doing the following where I have "[ what should this look like???? ] written above:

      "aggs": {
        "blue": {
          "filter": {
            "term": {
              "fieldB.keyword": "blue"
            }
          }
        },
        "green": {
          "filter": {
            "term": {
              "fieldB.keyword": "green"
            }
          }
        },
        "both": {
          "bucket_script": {
            "buckets_path": {
              "blue_count": "blue._count",
              "green_count": "green._count"
            },
            "script": "if ((params.blue_count > 0) && (params.green_count > 0)) { 1}"
          }
        }