How to return predictions in the [0, 1] interval for SVMs in vowpal wabbit

1.1k views Asked by At

Apologies if this has been asked already. Instead of the raw predictions (-r) I would like to return predictions in the [0, 1] interval for an SVM trained in vowpal wabbit by setting -loss_function hinge. Currently I'm trying this but it's not giving me what I want. Any thoughts?

vw -d vw_train_rand.vw -c -f svm_rand.vw --passes 10 --loss_function hinge -q cn;

vw -d vw_test_rand.vw -t -i svm_rand.vw -p preds_rand_svm.txt

Cheers

Aaron

EDIT:

1) Sample data:

-1 |c Loan.TypeConventional:1 Loan.TypeFHA:0 Loan.TypeUnknown:0 Loan.TypeVA:0 |n Loan.Size:124500 LenderRank0612.0614:1939 ZipSquareMiles:53.1 MailDateMonth:5 ZipPerForeignBorn:11.4 ZipPerHighSchoolPlusDegree:57.2 ZipPerCollegePlusDegree:15.2 ZipPerVeterans:13.4 ZipPopPerSquareMile:798.1 ZipPerUnemployement:8.5 ZipSexRatio:96.7 ZipHousingUnitsPerSquareMile:315.1 ZipMedianHouseholdIncome:36238 ZipPerCapitaIncome:19085 MonthsDeedDatetoMailDate:2
-1 |c Loan.TypeConventional:1 Loan.TypeFHA:0 Loan.TypeUnknown:0 Loan.TypeVA:0 |n Loan.Size:232000 LenderRank0612.0614:391 ZipSquareMiles:99.1 MailDateMonth:5 ZipPerForeignBorn:11.8 ZipPerHighSchoolPlusDegree:73.3 ZipPerCollegePlusDegree:39.3 ZipPerVeterans:9.1 ZipPopPerSquareMile:485.5 ZipPerUnemployement:5.9 ZipSexRatio:98.5 ZipHousingUnitsPerSquareMile:169.6 ZipMedianHouseholdIncome:78465 ZipPerCapitaIncome:31908 MonthsDeedDatetoMailDate:3
-1 |c Loan.TypeConventional:1 Loan.TypeFHA:0 Loan.TypeUnknown:0 Loan.TypeVA:0 |n Loan.Size:90000 LenderRank0612.0614:130 ZipSquareMiles:32.6 MailDateMonth:5 ZipPerForeignBorn:51.5 ZipPerHighSchoolPlusDegree:60.7 ZipPerCollegePlusDegree:17.3 ZipPerVeterans:9.3 ZipPopPerSquareMile:783.2 ZipPerUnemployement:4.8 ZipSexRatio:97.2 ZipHousingUnitsPerSquareMile:274.2 ZipMedianHouseholdIncome:64668 ZipPerCapitaIncome:25632 MonthsDeedDatetoMailDate:3
-1 |c Loan.TypeConventional:0 Loan.TypeFHA:0 Loan.TypeUnknown:0 Loan.TypeVA:1 |n Loan.Size:121301 LenderRank0612.0614:23 ZipSquareMiles:6.8 MailDateMonth:5 ZipPerForeignBorn:14.9 ZipPerHighSchoolPlusDegree:63.9 ZipPerCollegePlusDegree:24.2 ZipPerVeterans:10 ZipPopPerSquareMile:5245.1 ZipPerUnemployement:7.1 ZipSexRatio:93.3 ZipHousingUnitsPerSquareMile:2001.6 ZipMedianHouseholdIncome:56398 ZipPerCapitaIncome:25815 MonthsDeedDatetoMailDate:2

2) What I get currently:

-1.001968
-1.000737
-1.000441
-1.001823

3) What I'd like to see: Predictions in a continuous [0, 1] interval such that each entry can be interpreted as a forecasted probability associated with the event, e.g.:

0.012
0.009
0.010
0.0085
1

There are 1 answers

6
Martin Popel On BEST ANSWER

If you want to predict probabilities, you should train with --loss_function=logistic and test with --link=logistic. The hinge loss (used in SVM) results in max-margin classifier, which is not suitable for predicting probabilities.

Note that just using --loss_function=hinge does not make SVM from VW (there is no kernel). If you want Support Vector Machine with radial-basis kernel trained in online fashion, use --kvsm --kernel=rbf (see vw --ksvm -h | grep -A9 KSVM for more parameters).