I'm guessing this is most likely going to be an aggregation because I don't know the value of one particular field - the field that needs to be compared with other results. I'm trying to create a list of results where: fieldA in Result1 has the exact same value as fieldA in Result2 (I don't care what it is) AND where 1 Result1 has "blue" in fieldB and Result2 has "green" in fieldB (I do care that "blue is present in one and "green" is present in the other. Am I correct that this is only achievable through aggregations? If so, what would that aggregation script look like (I'm assuming it's a script too). Thanks.

So, here's some sample data:

[
    {
        "record_id": "1",
        "record_type": "typeA",
        "field_a": "1111111111",
        "field_b": "blue"
    },
    {
        "record_id": "2",
        "record_type": "typeA",
        "field_a": "1111111111",
        "field_b": "green"
    },
    {
        "record_id": "3",
        "record_type": "typeA",
        "field_a": "2222222222",
        "field_b": "blue"
    },
    {
        "record_id": "4",
        "record_type": "typeA",
        "field_a": "2222222222",
        "field_b": "yellow"
    }
]

My query only knows the value of "record_type". What I need is a list of "field_a" values that match where there are at least 2 results where 1 has "blue" in "field_b" and 1 has "green" in "field_b".

So, in this example, I would like to know that "1111111111" matches that criteria - there is 1 with "blue" and 1 with "green", while "2222222222" does NOT match since there is 1 with "blue" and 1 with "yellow" (i.e., not "green").

I know I can do an aggregation that, for instance, counts the values in "field_a" for my results.

However, there can be 1 - N with the same value in "field_a". I specifically need to know if there is at least 1 with "blue" in "field_b" and at least 1 with "green" in "field_b" for a given "field_a" value.

Hopefully that illustrates the problem, but if not here's what I "think" I should be doing - an aggregation of an aggregation:

{
  "size": 0,
   "query": {
      "bool": {
        "must": [
        {
          "query_string": {
            "query": "*",
            "analyze_wildcard": true,
            "default_field": "*"
          }
        }
      ],
    "filter" : [
      {
        "terms" : {
          "record_type.keyword" : [
            "typeA"
          ],
          "boost" : 1.0
        }
      }
    ]
      }
   },
   "aggs": {
    "uniq_field_a_values": {
      "terms": {
        "field": "field_a.keyword",
        "size" : 10000
      }
    },
    "aggs": {
      [ what should this look like???? ]
    }
  }
}
1

There are 1 answers

0
David Steiner On BEST ANSWER

I was able to get what I wanted by doing the following where I have "[ what should this look like???? ] written above:

      "aggs": {
        "blue": {
          "filter": {
            "term": {
              "fieldB.keyword": "blue"
            }
          }
        },
        "green": {
          "filter": {
            "term": {
              "fieldB.keyword": "green"
            }
          }
        },
        "both": {
          "bucket_script": {
            "buckets_path": {
              "blue_count": "blue._count",
              "green_count": "green._count"
            },
            "script": "if ((params.blue_count > 0) && (params.green_count > 0)) { 1}"
          }
        }