I'm using scikit-learn's LinearSVC SVM implementation, and I'm trying understand the multi-class prediction. Looking at coef_ and intercept_ I can get the hyperplane weights. For example, on my learning problem with two features and four labels I get
f0 = 1.99861379*x1 - 0.09489263*x2 + 0.89433196
f1 = -2.04309715*x1 - 3.51285420*x2 - 3.1206355
f2 = 0.73536996*x1 + 2.52111207*x2 - 3.04176149
f3 = -0.56607817*x1 - 0.16981337*x2 - 0.92804815
When I use the decision_function method I get the values that correspond to the above functions. But the documentation says
The confidence score for a sample is the signed distance of that sample to the hyperplane.
But decision_function does not return the signed distance, it just returns f().
To be more specific, I'm assuming that the LinearSVC uses the standard trick of having a constant 1 feature to represent a threshold. (This might be wrong.) For my example problem this gives a three dimensional feature space where instances are always of the form (1,x1,x2). Assuming no other threshold term, the algorithm learns a hyperplane w=(w0, w1, w2) that goes through the origin in this three dimensional space. Now I get a point to predict, call it z=(1,a,b). What is the signed distance (margin) of this point to the hyperplane. It's just dot(w,z)/2norm(w). The LinearSVC code is returning dot(w,z)
Thanks, Chris