Identifying filtered features after feature selection with scikit learn

289 views Asked by At

Here is my Code for feature selection method in Python:

from sklearn.svm import LinearSVC
from sklearn.datasets import load_iris
iris = load_iris()
X, y = iris.data, iris.target
X.shape
(150, 4)
X_new = LinearSVC(C=0.01, penalty="l1", dual=False).fit_transform(X, y)
X_new.shape
(150, 3)

But after getting new X(dependent variable - X_new), How do i know which variables are removed and which variables are considered in this new updated variable ? (which one removed or which three are present in data.)

Reason of getting this identification is to apply the same filtering on new test data.

1

There are 1 answers

1
pyan On BEST ANSWER

Modified your code a little bit. For each class, the features used can be seen by looking at the the coefficients of LinearSVC. According to the documentation, coef_ : array, shape = [n_features] if n_classes == 2 else [n_classes, n_features]

As for new data, you just need to apply transform to it.

from sklearn.svm import LinearSVC
from sklearn.datasets import load_iris
import numpy as np

iris = load_iris()
X, y = iris.data, iris.target
print X.shape

lsvc = LinearSVC(C=0.01, penalty="l1", dual=False)
X_new = lsvc.fit_transform(X, y)
print X_new.shape

print lsvc.coef_

newData = np.random.rand(100,4)
newData_X = lsvc.transform(newData)
print newData_X.shape