I've got a database of hundreds of thousands of forum posts, and would like to tag them in an unsupervised way.
I noticed that StackOverflow's tag system suggests tags as I go. How does this algorithm work?
I also found this that implies it is SVM based- is it official? http://dl.acm.org/citation.cfm?id=2660970&dl=ACM&coll=DL&CFID=522960920&CFTOKEN=15091676
You could also follow a shallow (authors call it deep though) inverse regression using Gensim and word embeddings for document classification. Ideally, using both the titles and text of the forum posts, you should be able to build a pretty decent classification system. Follow along here in this notebook and paper.