IBM Watson Natural Language Classifier (NLC) limits the text values in the training set to 1024 characters: https://console.bluemix.net/docs/services/natural-language-classifier/using-your-data.html#training-limits .
However the trained model can then classify every text whose length is at most 2048 characters: https://console.bluemix.net/apidocs/natural-language-classifier#classify-a-phrase .
This difference creates some confusion for me: I have always known that we should apply the same pre-processing to both training phase and production phase, therefore if I had to cap off the training data at 1024 chars I would do the same also in production.
Is my reasoning correct or not? Should I cap off the text in production at 1024 chars (as I think I should) or at 2048 chars (maybe because 1024 chars are too few)?
Thank you in advance!
Recently, I had the same question and one of the answers on an article clarified the same
Here's the link to the article