Spiral data classification

65 views Asked by At

I have a separable 2 class spiral data, namely blue and red, spiraling out from origin. I know KNN and SVM are suitable for the classification prupose, but i wonder can I achieve not-bad classifying result using log regression?

I had tried couple of features like (r and theta), (sinx, siny, r), and more. but none seems to work well

1

There are 1 answers

0
Muhammed Yunus On

I know KNN and SVM are suitable for the classification prupose, but i wonder can I achieve not-bad classifying result using log regression?

SVMs are a linear model that can implicitly transform features to a higher-dimensional space. If you transform the features beforehand and then feed them into a logistic regression, you can get similar results to an SVM (the models define and optimise different objectives, so their decision boundaries won't be the same).

The code below explicity maps features to an RBF space (an RBF SVM does this implicity), and then supplies those transformed features to LogisticRegression(). Results from SVM(kernel='rbf') are included for comparison.

enter image description here

import matplotlib.pyplot as plt
import numpy as np

from sklearn.datasets import make_moons


#Create dataset
X, y = make_moons(n_samples=100, noise=0.2, random_state=0)

plt.scatter(X[:, 0], X[:, 1], c=y, cmap='seismic')
plt.gcf().set_size_inches(5, 3)

#Create RBF features
# and feed them into the logistic regression model
from sklearn.linear_model import LogisticRegression
from sklearn.kernel_approximation import RBFSampler, Nystroem
from sklearn.pipeline import Pipeline

pipeline = Pipeline([
    # ('rbf_features', RBFSampler(gamma=0.5, random_state=0)), #faster & approximate
    ('rbf_features', Nystroem(random_state=0)),
    ('logistic_regression', LogisticRegression(C=20, random_state=0))
])
#Fit logistic regression model on the new features
pipeline.fit(X, y)

#View the decision boundary
xx, yy = np.meshgrid(
    np.linspace(-2.5, 2.5, num=50),
    np.linspace(-2.5, 2.5, num=50)
)

proba_map = pipeline.predict_proba(np.column_stack([xx.ravel(), yy.ravel()]))
proba_map = proba_map[:, 1].reshape(xx.shape)

plt.contourf(xx, yy, proba_map, zorder=-1, cmap='coolwarm', alpha=0.5)
plt.xlabel('feature 0')
plt.ylabel('feature 1')
plt.title('Logistic regression fit on RBF features')
plt.colorbar(label='probability')
plt.show()

#
#SVM for comparison
#
from sklearn.svm import SVC
svc = SVC(kernel='rbf', probability=True).fit(X, y)

#Get probabilities (or decision map)
# decision_values = svc.decision_function(np.column_stack([xx.ravel(), yy.ravel()]))
proba_map = svc.predict_proba(np.column_stack([xx.ravel(), yy.ravel()]))[:, 1]
proba_map = proba_map.reshape(xx.shape)

plt.scatter(X[:, 0], X[:, 1], c=y, cmap='seismic')
plt.gcf().set_size_inches(5, 3)
plt.contourf(xx, yy, proba_map, zorder=-1, cmap='coolwarm', alpha=0.5)
plt.xlabel('feature 0')
plt.ylabel('feature 1')
plt.title('SVM with RBF kernel')
plt.colorbar(label='probability')