How to sort Elasticsearch by documents in an id?

315 views Asked by At

I am using the free tier of Bonsai and am trying to write a script to manage the number of documents in my Elastic index. To maximize the number of documents I can save, I would like to start removing docs for which there are many nested documents within.

Example:

{   
 "title": "Spiderman saves child from well",   
 "body":  "Move over, Lassie! New York has a new hero. But is he also a menace?",   
 "authors": [
   { 
      "name":  "Jonah Jameson",       
      "title": "Sr. Editor",     
   },     
   {       
      "name":  "Peter Parker",       
      "title": "Photos",     
   }   
  ],   
 "comments": [     
   {       
      "username": "captain_usa",       
      "comment":  "I understood that reference!",     
   },     
   {       
      "username": "man_of_iron",       
      "comment":  "Congrats on being slightly more useful than a ladder.",     
   }   
  ],   
 "photos": [ 
   {       
      "url":      "https://assets.dailybugle.com/12345",       
      "caption":  "Spiderman delivering Timmy back to his mother",     
   }   
  ] 
 }
    

Is there anything in Elastic that would tell me that this document is really 6 documents because of the extensive nesting? Ideally, I would be able to sort elastic records by this "document count".

Thanks!

1

There are 1 answers

0
Joe - Check out my books On

If your authors, comments and photos are trivially nested (an array of objects) OR of the dedicated elasticsearch nested data type, you can do the following:

GET bonsai/_search
{
  "_source": [""], 
  "sort": [
    {
      "_script": {
        "type": "number",
        "script": {
          
          "source": """
            def count = 1; // top level doc count is 1
            for (def entry : params._source.values()) {
              if (entry instanceof ArrayList) {
                count += entry.size()
              }
            }
            return count;
          """
        }
      }
    }
  ]
}

I don't really see how the above doc would be of size 6 -- so I presumed it's because you counted the top level doc too. Feel free to start counting at 0 in the script.