The ground truth we know is used to re-train the NLC or R&R.
The ground truth is a question level training data.
e.g.
"How hot is it today?,temperature"
The question "how hot is it today?" is therefore classified to "temperature" class.
Once the application is up, real user questions will be received. Some are the same (i.e. the question from the real users are the same to the question in the ground truth), some are similar terms, some are new questions. Assume the application has a feedback loop to know whether or not the class (for NLC) or answer (for R&R) are relevant.
About the new questions, the approach seems to just add the them to the ground truth, which is then used to re-train the NLC/R&R?
For the questions with similar terms, do we just add them like the new questions, or do we just ignore them, given that similar terms can also be scored well even similar terms are not used to train the classifier?
In the case of the same questions, there seems nothing to do on the ground truth for NLC, however, to the R&R, are we just increase or decrease 1 for the relevance label in the ground truth?
The main question here is, in short, about what the re-training approach is for NLC & R&R...
Once your application has gone live, you should periodically review your feedback log for opportunities for improvement. For NLC, if there are texts being incorrectly classified, then you can add those texts to the training set and retrain in order to improve your classifier.
It is not necessary to capture every imaginable variation of a class, as long as your classifier is returning acceptable responses.
You could use the additional examples of classes from your log to assemble a test set of texts that do not feature in your training set. Running this test set when you make changes will enable you to determine whether or not a change has inadvertently caused a regression. You can run this test either by calling the classifier using a REST client, or via the Beta Natural Language Classifier toolkit.