How to ensemble SVM and Logistic Regression with python

2.4k views Asked by At

I am doing a task of text classification(7000 texts evenly distributed by 10 labels). And by exploring SVM and and Logistic Regression

clf1 = svm.LinearSVC()
clf1.fit(X, y)
clf1.predict(X_test)
score1 = clf1.score(X_test,y_true)

clf2 = linear_model.LogisticRegression()
clf2.fit(X, y)
clf2.predict(X_test)
score2 = clf2.score(X_test,y_true)

I got two accuracies, score1 and score2 I guess whether I could improve my accuracy by developing an ensemble system combining the outputs of the two classifiers above. I have learnt knowledge on ensemble by myself and I know there are bagging,boosting,and stacking. However, I do not know how to use the scores predicted from my SVM and Logistic Regression in ensemble. Could anyone give me some ideas or show me some example code?

1

There are 1 answers

0
Dayvid Oliveira On

You can just multiply the probabilities, or use another combination rule.

In order to do that in a more generic way (try several rules) you can use brew.

from brew.base import Ensemble
from brew.base import EnsembleClassifier
from brew.combination.combiner import Combiner

# create your Ensemble
clfs = [clf1, clf2]
ens = Ensemble(classifiers=clfs)

# Since you have only 2 classifiers 'majority_vote' is note an option,
# rule = ['mean', 'majority_vote', 'max', 'min', 'median']
comb = Combiner(rule='mean')

# now create your ensemble classifier
ensemble_clf = EnsembleClassifier(ensemble=ens, combiner=comb)
ensemble_clf.predict(X)

Also, keep in mind that the classifiers should be diverse enough to give a good combination result.

If you had fewer features, I'd say you should check out some Dynamic Classifier/Ensemble Selection (also provided in brew) but since you probably have many features, euclidean distance probably do not make sense to get the region of competence of each classifier. Best thing is to check out by hand which kind of labels each classifiers tends to get right based on the confusion matrix.