Mahout 0.9: Using own test set instead of using split command

606 views Asked by At

I have referred to these two links to run mahout NB classifier


I would like to use my own test set instead of having mahout splitting my data into training and test sets (80:20). How can I achieve that?


There are 1 answers


Take two datasets for is for training & one for testing.

Run below commands on both sets:
1. seqdirectory
2. seq2sparse

Now you will have vectors generated for both datasets.
- Run trainnb command using first dataset's vectors output. So instead of training a model on 80% of the data, we are using the whole dataset.
- Run testnb command using second dataset's vectors output. This is not the 20% of the data, it's completely new dataset, solely used for testing.

So instead of using mahout split, we have specified our own dataset for testing your model.