How to keep a text classifier accurate as the corpus changes

Question

How to keep a text classifier accurate as the corpus changes

119 views Asked by JeffR At 21 December 2014 at 21:13

I have a conceptual question regarding text classification. I have a corpus of English language documents that I want to classify based on the content of the document. I am working on building a classifier - I'm not sure yet what method I will use: possibly SVMs, Bayes or NN. I will have a training set of documents, and of course a test set.

Here's my question: The corpus of documents will be added to over time, so it is possible that the classifier constructed now will, over time as the corpus changes, become less accurate. How do I keep the classifier current and accurate? Do I implement regular re-training? Is there a method of continuous training as the corpus changes? How is this circumstance handled?

Original Q&A

There are 1 answers

**Montaser Awal** · Answer 1 · 2015-04-08T14:48:47+00:00

You have two possible solutions:

(The easiest) if you cannot garantee a representative training dataset, you might consider redo the training step in regular periods (each time you have sufficent new examples).
you can consider active (or incremental) learning, however this method will require the final user interaction which is not always desired.

TechQA.

How to keep a text classifier accurate as the corpus changes

There are 1 answers

Related Questions in DOCUMENT-CLASSIFICATION

Related Questions in TEXT-CLASSIFICATION

Popular Questions

Popular Tags

Trending Questions