I am doing a task of text classification(7000 texts evenly distributed by 10 labels). And by exploring SVM and and Logistic Regression
clf1 = svm.LinearSVC()
clf1.fit(X, y)
clf1.predict(X_test)
score1 = clf1.score(X_test,y_true)
clf2 = linear_model.LogisticRegression()
clf2.fit(X, y)
clf2.predict(X_test)
score2 = clf2.score(X_test,y_true)
I got two accuracies, score1
and score2
I guess whether I could improve my accuracy by developing an ensemble system combining the outputs of the two classifiers above.
I have learnt knowledge on ensemble
by myself and I know there are bagging,boosting,and stacking
.
However, I do not know how to use the scores predicted from my SVM and Logistic Regression in ensemble
. Could anyone give me some ideas or show me some example code?
You can just multiply the probabilities, or use another combination rule.
In order to do that in a more generic way (try several rules) you can use brew.
Also, keep in mind that the classifiers should be diverse enough to give a good combination result.
If you had fewer features, I'd say you should check out some Dynamic Classifier/Ensemble Selection (also provided in brew) but since you probably have many features, euclidean distance probably do not make sense to get the region of competence of each classifier. Best thing is to check out by hand which kind of labels each classifiers tends to get right based on the confusion matrix.