NaiveBayes classifier handling different data types in python

4.4k views Asked by At

I am trying to implement Naive Bayes classifier in Python. My attributes are of different data types : Strings, Int, float, Boolean, Ordinal

I could use Gaussian Naive Bayes classifier (Sklearn.naivebayes : Python package) , But I do not know how the different data types are to be handled. The classifier throws an error, stating cannot handle data types other than Int or float

One way I could possibly think of is encoding the strings to numerical values. But I also doubt , how good the classifier would perform if I do this.

2

There are 2 answers

5
Numlet On BEST ANSWER

Yes, you will need to convert the strings to numerical values The naive Bayes classifier can not handle strings as there is not a way an string can enter in a mathematical equation.

If your strings have some "scalar value" for example "large, medium, small" you might want to classify them as "3,2,1", However, if your strings are things without order such as colours or names, you can do this or assign binary variables with every variable referring to a colour or name, if they are not many.

For example if you are classifying cars an they can be red blue and green you can define the variables 'Red' 'Blue' 'Green' that take the values 0/1, depending on the colour of your car.

0
Jijo Jose On

Don't convert data type manually instead use the dict vectorization.

http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.DictVectorizer.html