I am doing a text classification task. Now I want to use ensemble.AdaBoostClassifier
with LinearSVC
as base_estimator
. However, when I try to run the code
clf = AdaBoostClassifier(svm.LinearSVC(),n_estimators=50, learning_rate=1.0, algorithm='SAMME.R')
clf.fit(X, y)
An error occurred. TypeError: AdaBoostClassifier with algorithm='SAMME.R' requires that the weak learner supports the calculation of class probabilities with a predict_proba method
The first question is Cannot the svm.LinearSVC()
calculate the class probabilities ? How to make it calculate the probabilities?
Then I Change the parameter algorithm
and run the code again.
clf = AdaBoostClassifier(svm.LinearSVC(),n_estimators=50, learning_rate=1.0, algorithm='SAMME')
clf.fit(X, y)
This time TypeError: fit() got an unexpected keyword argument 'sample_weight'
happens. As is said in AdaBoostClassifier, Sample weights. If None, the sample weights are initialized to 1 / n_samples.
Even if I assign an integer to n_samples
, error also occurred.
The second question is What does n_samples
mean? How to solve this problem?
Hope anyone could help me.
According to @jme 's comment, however, after trying
clf = AdaBoostClassifier(svm.SVC(kernel='linear',probability=True),n_estimators=10, learning_rate=1.0, algorithm='SAMME.R')
clf.fit(X, y)
The program cannot get a result and the memory used on the server keeps unchanged.
The third question is how I can make AdaBoostClassifier
work with SVC
as base_estimator?
The right answer will depend on exactly what you're looking for. LinearSVC cannot predict class probabilities (required by default algorithm used by AdaBoostClassifier) and does not support sample_weight.
You should be aware that the Support Vector Machine does not nominally predict class probabilities. They are computed using Platt scaling (or an extension of Platt scaling in the multi-class case), a technique which has known issues. If you need less "artificial" class probabilities, an SVM might not be the way to go.
With that said, I believe the most satisfying answer given your question would be that given by Graham. That is,
You have other options. You can use SGDClassifier with a hinge loss function and set AdaBoostClassifier to use the SAMME algorithm (which does not require a predict_proba function, but does require support for sample_weight):
Perhaps the best answer would be to use a classifier that has native support for class probabilities, like Logistic Regression, if you wanted to use the default algorithm provided for AdaBoostClassifier. You can do this using scikit.linear_model.LogisticRegression or using SGDClassifier with a log loss function, as used in the code provided by Kris.
Hope that helps, if you're curious about what Platt scaling is, check out the original paper by John Platt here.