I am making an application that organizes a set of documents (ranging in number from a minimum of ~10 documents to a maximum of ~2000) into groups, based on the word/phrase content of each document. Each document can range from a paragraph of words to about a page and a half.
I'm not looking for a document clustering library that clusters results based on an initial search term, but a library that clusters without a search term.
Are there any libraries out there that do document clustering that can easily integrate with an Objective-C project?
I'm not very well-read in Object C, but if you can import native C code then you could use the greedyRSC heuristic. We had very nice results for the Reuters and LA-Times corpora.
Description of the message & C-code is available here: http://research.nii.ac.jp/~meh/greedyRSC/rscpage.html