I trained a Naive Bayes multinomial model for binary classification of text for the presence of personal data in it.
model = MultinomialNB() model.fit(X_train, y_train)
I can't figure out how to use the already trained model to be able to check any text. An error occurs with this action y_pred = model.predict(new_X_test) - "ValueError: X has 1 features, but MultinomialNB is expecting 6328 features as input.". As I can understand this is related to the CountVectorizer text preprocessor (from sklearn.feature_extraction.text import CountVectorizer)
For example:
X = vectorizer.fit_transform(df['sentence']) X.shape
returns:
(1600, 6328)
the object X has 6328 features
X is the same as the X_train on which the model was trained.
So, in order to test the new text on this trained model, I need to process it (using CountVectorizer) in such a way that the object I will feed to the model has 6328 features.
But I don't understand at all how to do it.
Or maybe I'm wrong about something?