I am running json facet queries in solr 5.3 using solrj API and each query has many subfacets also ( can be 4-5 field deep). The total indexed docs are greater then 8 million. I extract the facets result from the response in a NamedList object.
But I want to get the paginated result of faceting. Now, I am getting the whole result in one go, but it can go "out of memory" in future.
{
"ComputerName_s":{
"allBuckets":true,
"type":"terms",
"field":"ComputerName_s",
"limit":-1,
"facet":{
"ProcessName_s":{
"type":"terms",
"field":"ProcessName_s",
"limit":-1,
"facet":{
"PID_i":{
"type":"terms",
"field":"PID_i",
"limit":-1,
"facet":{
"timestamp":{
"type":"terms",
"field":"timestamp",
"limit":-1,
"facet":{
"max(Memory_l)":"max(Memory_l)",
"avg(Memory_l)":"avg(Memory_l)",
"min(ElapsedTime_l)":"min(ElapsedTime_l)"
}
}
}
}
}
}
}
}
}
For e.g. Above is the sample of json.facet query. So, How can I set the offset and limit of each field and get the paginated result.
Also, Can I get the faceting result in CSV format instead of the complicated tree like NamedList structure because I am trying to simulate the "group by" clause of sql and it its very time consuming to convert this structure to a row-by-row structure.
As far as I know, there is no deep pagination support for the facet feature. Only standard paging (which is horribly inefficient).
You can use the fields: facet.offset and facet.limit to create paging results.
However - Solr will recalculate the results for each query - so asking for the 2nd page would have Solr recalculate the first page data as well (which is what I meant when I said that it is inefficient).