I have a model pipeline with a Tfidf vectorizier as well as Logistic Regression in scikitlearn.

I am trying to use the eli5.show_prediction function on my text (NLP).

## Rand is just a random integer , and feat_ns is the list of all of my features.##
## X_test is from my test/train split##
## Yes the brackets around X_test[rand] are funky but this is what the function asked for##

eli5.show_prediction(pipeline.named_steps['logr'], doc= [[X_test[rand]]],top=30, feature_names = feat_ns)

Error: X has 1 features per sample; expecting 13791
1

There are 1 answers

0
amstergc20 On BEST ANSWER

I was able to answer my own question.

The reason for this is because my X_test variable has not yet been processed by my Tfidf vectorizer and therefore not meeting the dimension requirements.

It appears the function was not able to process the data through my pipeline.