Mahout 0.9: Using own test set instead of using split command

602 views Asked by At

I have referred to these two links to run mahout NB classifier

[1] http://tharindu-rusira.blogspot.com/2014/01/naive-bayes-classification-apache-mahout.html
[2] http://chimpler.wordpress.com/2013/03/13/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages/

I would like to use my own test set instead of having mahout splitting my data into training and test sets (80:20). How can I achieve that?

1

There are 1 answers

8
Rajkumar On BEST ANSWER

Take two datasets for is for training & one for testing.

Run below commands on both sets:
1. seqdirectory
2. seq2sparse

Now you will have vectors generated for both datasets.
- Run trainnb command using first dataset's vectors output. So instead of training a model on 80% of the data, we are using the whole dataset.
- Run testnb command using second dataset's vectors output. This is not the 20% of the data, it's completely new dataset, solely used for testing.

So instead of using mahout split, we have specified our own dataset for testing your model.