I'm trying Vowpal Wabbit and am in the process of figuring out the file formats required for training and testing. I've been following the tutorial from https://github.com/JohnLangford/vowpal_wabbit/wiki/Tutorial and see that the following is the training data format:
0 | price:.23 sqft:.25 age:.05 2006
1 2 'second_house | price:.18 sqft:.15 age:.35 1976
0 1 0.5 'third_house | price:.53 sqft:.32 age:.87 1924
For the testing data, I don't have the labels or any outputs, but just the features. How would I go about writing that out? I've tried just including the features like so:
price:.23 sqft:.25 age:.05 2006
price:.18 sqft:.15 age:.35 1976
price:.53 sqft:.32 age:.87 1924
But, that gives me exceptions as it's not the proper format. I have also tried the following and all give me just 0's as results:
| price:.23 sqft:.25 age:.05 2006
| price:.18 sqft:.15 age:.35 1976
| price:.53 sqft:.32 age:.87 1924
0 0 0 | price:.23 sqft:.25 age:.05 2006
0 0 0 | price:.18 sqft:.15 age:.35 1976
0 0 0 | price:.53 sqft:.32 age:.87 1924
Anyone the format I should be aiming for, knowing only the features? Thanks for the help.
The bar symbol (|) must be also in the format for predictions:
If you don't include the correct labels, vw cannot compute the test loss, of course. To get the predictions use
vw -d test_set.vw -t -p predictions.txt
. The training set in the tutorial (with three examples only) is too small to train any reasonable model.