Trying to use carrot2 for doing to resultset clustering. I have couple of questions with respect to this.
a) Can we cluster the documents in Solr/Lucene based on the specific fields in solr? like cluster them based name, person name and geo-distance location (lat, long) with specific field weights?
b) My use case for clustering is not really online, it is more of a batch use case, given that, do we still have this restriction of 1K max no. of results?
Carrot2 performs clustering based only on the natural text of your documents. Person names would probably be too short for meaningful clustering; Carrot2 is not suitable for geo-distance and other numerical data.
The 1k restriction / recommendation is based on the design goal of Carrot2: to cluster small collections of texts (such as search results) fast enough so that the process can be done on-line. Carrot2 does well for collections around 1k documents, but will not scale very well beyond several thousands of documents.