pagination in solr 5.3 json facets

566 views Asked by At

I am running json facet queries in solr 5.3 using solrj API and each query has many subfacets also ( can be 4-5 field deep). The total indexed docs are greater then 8 million. I extract the facets result from the response in a NamedList object.

But I want to get the paginated result of faceting. Now, I am getting the whole result in one go, but it can go "out of memory" in future.

   {  
   "ComputerName_s":{  
      "allBuckets":true,
      "type":"terms",
      "field":"ComputerName_s",
      "limit":-1,
      "facet":{  
         "ProcessName_s":{  
            "type":"terms",
            "field":"ProcessName_s",
            "limit":-1,
            "facet":{  
               "PID_i":{  
                  "type":"terms",
                  "field":"PID_i",
                  "limit":-1,
                  "facet":{  
                     "timestamp":{  
                        "type":"terms",
                        "field":"timestamp",
                        "limit":-1,
                        "facet":{  
                           "max(Memory_l)":"max(Memory_l)",
                           "avg(Memory_l)":"avg(Memory_l)",
                           "min(ElapsedTime_l)":"min(ElapsedTime_l)"
                        }
                     }
                  }
               }
            }
         }
      }
   }
}

For e.g. Above is the sample of json.facet query. So, How can I set the offset and limit of each field and get the paginated result.

Also, Can I get the faceting result in CSV format instead of the complicated tree like NamedList structure because I am trying to simulate the "group by" clause of sql and it its very time consuming to convert this structure to a row-by-row structure.

1

There are 1 answers

1
Uri Shtand On

As far as I know, there is no deep pagination support for the facet feature. Only standard paging (which is horribly inefficient).

You can use the fields: facet.offset and facet.limit to create paging results.

However - Solr will recalculate the results for each query - so asking for the 2nd page would have Solr recalculate the first page data as well (which is what I meant when I said that it is inefficient).