How to get value from aggregation buckets in Java for elasticsearch aggregation query result

6k views Asked by At

So, I have been able to replicate the elasticsearch query as desired in Java with the elasticsearch high level restclient. The problem is that I cannot retrieve the values I want. Before I give the code, I want to address the overarching goal in case there is a much easier solution (seems like this shouldn't have to be so difficult.)

Overarching goal: get number of documents where 'visited'==true for each unique value in the 'recommender' field.

My current status: I have been able to write a query with the desired output in kibana/elasticsearch, but when I replicate this query in Java, I am unable to access the data I need.(verified with searchRequest.source().toString()).

Here is the query:

{
  "aggs":{
    "recommenderIDs": {
      "terms": {
        "field": "recommender"
      },
      "aggs": {
        "visit_stats": {
          "filters": {
            "filters": {
              "visited": {
                "match":{
                  "visited": true
                }
              }
            }
          }
        }
      }
    }
  }
}

And this is what I have in my java code:

// ...
        SearchRequest searchRequest = new SearchRequest(INDEX_REC_RECOMMENDATIONS);
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        String aggregationName = "recommenderId";
        String filterName = "wasVisited";
        TermsAggregationBuilder aggQuery = AggregationBuilders
                .terms(aggregationName)
                .field(RecommendationRepoFieldNames.RECOMMENDER);
        AggregationBuilder aggFilters = AggregationBuilders.filters(
                filterName,
                new FiltersAggregator.KeyedFilter(
                        RecommendationRepoFieldNames.RECOMMENDER,
                        QueryBuilders.termQuery(RecommendationRepoFieldNames.VISITED, true))
        );
        aggQuery.subAggregation(aggFilters);
        searchSourceBuilder.aggregation(aggQuery);
        searchRequest.source(searchSourceBuilder);
//        System.out.println(searchRequest.source().toString());
        try {
            SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
            Aggregations aggregations = searchResponse.getAggregations();
            Terms byRecommenderId = aggregations.get(aggregationName);
            Filters filterResponses = searchResponse.getAggregations().get(aggregationName);
//            for (Filters.Bucket entry : filterResponses.getBuckets()) {
//                String key = entry.getKeyAsString();
//            }
            for (Terms.Bucket bucket : byRecommenderId.getBuckets()) {
                String bucketKey = bucket.getKeyAsString();
                long totalDocs = bucket.getDocCount();
                Aggregation visitedDocs = bucket.getAggregations().get(filterName);
                //long visitedDocsCount = visitedDocs.getValue();
                System.out.println();
            }
        } catch (IOException e) { //...

I've been fiddling with this all day and cannot make any progress. It's especially frustrating because I can see the doc count for each recommender bucket when I am debugging in my IDE, but I have no idea how to access it. I realize that there are approximately 180 classes extending Aggregation and I have tried a few, but failed every time.

Additionally, if you know any decent resource for the elasticsearch java high level rest client, please let me know. Thank you!

---------EDIT 5/4/21 -------------

Example output from elasticsearch:

// searchResponse (documents returned have been truncated to show only part we are interested in)

  "aggregations": {
    "sterms#recommenderId": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "AdjacentActivityRecommender",
          "doc_count": 3,
          "filters#wasVisited": {
            "buckets": {
              "recommender": {
                "doc_count": 2
              }
            }
          }
        },
        {
          "key": "DefaultProfileDBRecommender",
          "doc_count": 2,
          "filters#wasVisited": {
            "buckets": {
              "recommender": {
                "doc_count": 2
              }
            }
          }
        },
        {
          "key": "PSTR_SC_DI",
          "doc_count": 2,
          "filters#wasVisited": {
            "buckets": {
              "recommender": {
                "doc_count": 1
              }
            }
          }
        },
        {
          "key": "SignificantCategories",
          "doc_count": 2,
          "filters#wasVisited": {
            "buckets": {
              "recommender": {
                "doc_count": 2
              }
            }
          }
        }
      ]
    }
  }

searchResponse.getAggregations() is then saves to aggregations. Eventually, we are able to loop through the buckets for each recommenderID, but I am never able to get into the aggregations inside of each bucket which is what I need to do.

1

There are 1 answers

1
redgrengrumbholt On

Solution code posted below:

    try {
        SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
        Aggregations aggregations = searchResponse.getAggregations();
        Terms byRecommenderId = aggregations.get(aggregationName);
        for (Terms.Bucket bucket : byRecommenderId.getBuckets()) {
            String recommenderId = bucket.getKeyAsString();
            double totalDocs = bucket.getDocCount();
            // next two lines are the solution:
            Aggregations subAggregations = bucket.getAggregations();
            Filters byWasVisited = subAggregations.get(filterName);
            // always only one item from getBuckets()
            double totalVisited = byWasVisited.getBuckets().get(0).getDocCount();
            double percentVisited = totalVisited / totalDocs;
            recommenderViews.put(recommenderId, percentVisited);
        } 
        // ...

The issue was that I needed to extract the next inner level of aggregations (subAggregations) which is done by calling getAggregations() once more, this time inside of the loop. At this point, we simply call get(filterName) from the subAggregations.