I'm trying to do the following simple classification using the LinearSVC
object in scikit-learn
. I've tried using both version 0.10 and 0.14. Using the code:
from sklearn.svm import LinearSVC, SVC
from numpy import *
data = array([[ 1007., 1076.],
[ 1017., 1009.],
[ 2021., 2029.],
[ 2060., 2085.]])
groups = array([1, 1, 2, 2])
svc = LinearSVC()
svc.fit(data, groups)
svc.predict(data)
I get the output:
array([2, 2, 2, 2])
However, if I replace the classifier with
svc = SVC(kernel='linear')
then I get the result
array([ 1., 1., 2., 2.])
which is correct. Does anyone know why using LinearSVC
would botch this simple problem?
The algorithm underlying
LinearSVC
is very sensitive to extreme values in its input:(The warning refers to the LibLinear FAQ, since scikit-learn's
LinearSVC
is based on that library.)You should normalize before fitting: