Retrieving Sklearn typehints during runtime

114 views Asked by At

I am creating an app to train/use Sklearn models. The user should be able to select the various Sklearn models, at which point the arguments to the selected algorithm must be change-able by the user before training/running. To achieve this efficiently for all (future) Sklearn models, I want to retrieve the typehints and default values for the arguments to the methods during runtime and create the UI automatically based on these.

The Sklearn modules do not use type-hints, though VSCode autocomplete does 'know' the types and default values (from what I could find Vscode uses the Typeshed but it seems to be difficult to access it during runtime).

In short:
I want to pass sklearn.linear_model.LinearRegression to something and receive:

(*, fit_intercept: bool = True, copy_X: bool = True, n_jobs: Int | None = None, positive: bool = False)

How would I go about receiving this information during runtime?

2

There are 2 answers

0
Wouter On BEST ANSWER

All Sklearn classes have a private variable called _parameter_constraints which contains a list of constraints for each parameter. In my case, it seems to be easiest to deduce the possible types and UI element from these constraints.

E.g., for linear regression:

_parameter_constraints: dict = {
 "fit_intercept": ["boolean"],
 "copy_X": ["boolean"],
 "n_jobs": [None, Integral],
 "positive": ["boolean"],
}
1
Corralien On

sklearn uses numpy docstrings, so you can use numpydoc and use inspect to get the signature

import sklearn.linear_model
from numpydoc.docscrape import ClassDoc

doc = ClassDoc(sklearn.linear_model.LinearRegression)
sig = inspect.signature(sklearn.linear_model.LinearRegression)

Usage:

>>> sig
<Signature (*, fit_intercept=True, copy_X=True, n_jobs=None, positive=False)>

>>> doc['Parameters']
[Parameter(name='fit_intercept', type='bool, default=True', desc=['Whether to calculate the intercept for this model. If set', 'to False, no intercept will be used in calculations', '(i.e. data is expected to be centered).']), 
 Parameter(name='copy_X', type='bool, default=True', desc=['If True, X will be copied; else, it may be overwritten.']),
 Parameter(name='n_jobs', type='int, default=None', desc=['The number of jobs to use for the computation. This will only provide', 'speedup in case of sufficiently large problems, that is if firstly', '`n_targets > 1` and secondly `X` is sparse or if `positive` is set', 'to `True`. ``None`` means 1 unless in a', ':obj:`joblib.parallel_backend` context. ``-1`` means using all', 'processors. See :term:`Glossary <n_jobs>` for more details.']),
 Parameter(name='positive', type='bool, default=False', desc=['When set to ``True``, forces the coefficients to be positive. This', 'option is only supported for dense arrays.', '', '.. versionadded:: 0.24'])]