Multilabel Classification using NaiveBayes Classifier in Spark

1.1k views Asked by At

I have the data in the format
blah sentence one --> label1, label2
blah sentence two --> label2, label4
blah sentence three --> label3

How can I use OneVsRestClassifier with NaiveBayesClassifier in Spark? (i.e., How should my data be structured?). For a multi-class classification with NaiveBayes, the class LabeledPoint contains label and Feature Vector. But, for the above mentioned case, how should the data be structured?

1

There are 1 answers

0
marilena.oita On

Just structure the data as usual (LabeledPoint), but use multiple classifiers (e.g, OneVsRest), and switch up the data passed into each (based on your multiple labelled vectors). Another solution is to get the probabilities for all classes, instead of getting the most probable (predict(p.features()))

Vector prediction = model.predictProbabilities(p.features());

and then take the topk most probable predictions using a threshold filtering.