Why doesn't the the cross-validation results summary in GridSearch (.cv_results_) mask irrelevant parameters as expected?

33 views Asked by Nolatar At 25 October 2023 at 14:15

I am fitting an SVC using sklearn with GridSearchCV over a set of parameters and expect some entries from the grid's cv_results_ returned attribute to be masked when parameters do not apply to that particular grid run (e.g. when the kernel is linear, I expect the gamma entry to be masked). Instead, all entries are populated, regardless of relevance, with no 'mask' applied as shown in the documentation

Running the documented examples on sklearn version 1.2.1 and 1.3.1:

>>> from sklearn import svm, datasets
>>> from sklearn.model_selection import GridSearchCV
>>> iris = datasets.load_iris()
>>> parameters = {'kernel':('linear', 'rbf') , 'C':[1, 10], 'gamma': [0.1, 1]}
>>> svc = svm.SVC()
>>> clf = GridSearchCV(svc, parameters)
>>> clf.fit(iris.data, iris.target)

... my clf.cv_results_ shows all 2^3 = 8 combinations, with all masks of the masked_array as False. Copying the relevant entries from clf.cv_results_:

{'mean_fit_time': array([0.00106173, 0.00110359, 0.00094199, 0.00119796, 0.00092278, 0.00104046, 0.00092359, 0.00126338]),
 'std_fit_time': array([2.33411589e-04, 1.05827065e-04, 1.01234561e-04, 4.21319023e-05, 6.95271419e-05, 1.32059704e-04, 1.18666692e-04, 2.69801231e-04]),
 'param_C': masked_array(data=[1, 1, 1, 1, 10, 10, 10, 10],
              mask=[False, False, False, False, False, False, False, False],
        fill_value='?',
             dtype=object),
 'param_gamma': masked_array(data=[0.1, 0.1, 1, 1, 0.1, 0.1, 1, 1],
              mask=[False, False, False, False, False, False, False, False],
        fill_value='?',
             dtype=object),
 'param_kernel': masked_array(data=['linear', 'rbf', 'linear', 'rbf', 'linear', 'rbf',
                    'linear', 'rbf'],
              mask=[False, False, False, False, False, False, False, False],
        fill_value='?',
             dtype=object),
 'params': [{'C': 1, 'gamma': 0.1, 'kernel': 'linear'},
  {'C': 1, 'gamma': 0.1, 'kernel': 'rbf'},
  {'C': 1, 'gamma': 1, 'kernel': 'linear'},
  {'C': 1, 'gamma': 1, 'kernel': 'rbf'},
  {'C': 10, 'gamma': 0.1, 'kernel': 'linear'},
  {'C': 10, 'gamma': 0.1, 'kernel': 'rbf'},
  {'C': 10, 'gamma': 1, 'kernel': 'linear'},
  {'C': 10, 'gamma': 1, 'kernel': 'rbf'}]
}

To my understanding, every other mask for param_gamma should be True (whenever the kernel is linear).

I know irrelevant parameters are ignored, but their presence in the grid triggers separate runs of model training, judging by the separate entries into .cv_results_ and their respective slightly different training times in cross-validation. As this is unnecessarily computationally taxing. How do I avoid this behaviour?

Original Q&A

TechQA.

Why doesn't the the cross-validation results summary in GridSearch (.cv_results_) mask irrelevant parameters as expected?

There are 0 answers

Related Questions in SCIKIT-LEARN

Related Questions in GRID-SEARCH

Related Questions in GRIDSEARCHCV

Related Questions in SVC

Related Questions in MASKED-ARRAY

Popular Questions

Popular Tags

Trending Questions