I have the data in the format
blah sentence one --> label1, label2
blah sentence two --> label2, label4
blah sentence three --> label3
How can I use OneVsRestClassifier with NaiveBayesClassifier in Spark?
(i.e., How should my data be structured?).
For a multi-class classification with NaiveBayes, the class LabeledPoint
contains label
and Feature Vector
. But, for the above mentioned case, how should the data be structured?
Just structure the data as usual (LabeledPoint), but use multiple classifiers (e.g, OneVsRest), and switch up the data passed into each (based on your multiple labelled vectors). Another solution is to get the probabilities for all classes, instead of getting the most probable (predict(p.features()))
and then take the topk most probable predictions using a threshold filtering.