I apologies for the naive question, I've trained a model (Naive Bayes) in python , it does well (95% accuracy). It takes an input string (i.e. 'Apple Inc.' or 'John Doe') and discerns whether it's a business name or customer name.
How do I actually implement this on another data set? If I bring in another pandas dataframe, how do I apply what the model has learned from the training data to the new dataframe?
The new dataframe has a completely new population and set of strings that it needs to predict whether its a business or customer name.
Ideally I would like to insert into the new dataframe a column that has the model's prediction.
Any code snippets are appreciated.
Sample code of current model:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df["CUST_NM_CLEAN"],
df["LABEL"],test_size=0.20,
random_state=1)
# Instantiate the CountVectorizer method
count_vector = CountVectorizer()
# Fit the training data and then return the matrix
training_data = count_vector.fit_transform(X_train)
# Transform testing data and return the matrix.
testing_data = count_vector.transform(X_test)
#in this case we try multinomial, there are two other methods
from sklearn.naive_bayes import cNB
naive_bayes = MultinomialNB()
naive_bayes.fit(training_data,y_train)
#MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)
predictions = naive_bayes.predict(testing_data)
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
print('Accuracy score: {}'.format(accuracy_score(y_test, predictions)))
print('Precision score: {}'.format(precision_score(y_test, predictions, pos_label='Org')))
print('Recall score: {}'.format(recall_score(y_test, predictions, pos_label='Org')))
print('F1 score: {}'.format(f1_score(y_test, predictions, pos_label='Org')))
Figured it out.