Vowpal Wabbit training and testing data formats

2.8k views Asked by At

I'm trying Vowpal Wabbit and am in the process of figuring out the file formats required for training and testing. I've been following the tutorial from https://github.com/JohnLangford/vowpal_wabbit/wiki/Tutorial and see that the following is the training data format:

0 | price:.23 sqft:.25 age:.05 2006
1 2 'second_house | price:.18 sqft:.15 age:.35 1976
0 1 0.5 'third_house | price:.53 sqft:.32 age:.87 1924

For the testing data, I don't have the labels or any outputs, but just the features. How would I go about writing that out? I've tried just including the features like so:

price:.23 sqft:.25 age:.05 2006
price:.18 sqft:.15 age:.35 1976
price:.53 sqft:.32 age:.87 1924

But, that gives me exceptions as it's not the proper format. I have also tried the following and all give me just 0's as results:

| price:.23 sqft:.25 age:.05 2006
| price:.18 sqft:.15 age:.35 1976
| price:.53 sqft:.32 age:.87 1924

0 0 0 | price:.23 sqft:.25 age:.05 2006
0 0 0 | price:.18 sqft:.15 age:.35 1976
0 0 0 | price:.53 sqft:.32 age:.87 1924

Anyone the format I should be aiming for, knowing only the features? Thanks for the help.

1

There are 1 answers

3
Martin Popel On BEST ANSWER

The bar symbol (|) must be also in the format for predictions:

| price:.23 sqft:.25 age:.05 2006
| price:.18 sqft:.15 age:.35 1976
| price:.53 sqft:.32 age:.87 1924

If you don't include the correct labels, vw cannot compute the test loss, of course. To get the predictions use vw -d test_set.vw -t -p predictions.txt. The training set in the tutorial (with three examples only) is too small to train any reasonable model.