So, I am writing a simple application using the python sklearn library. I need to parse the docstring for any of the sklearn estimator models. I am not familiar with reSTructured text but some quick research from the sklearn "Contributing documentation" page seems to suggest that these docstrings are in reSTructured text format. Following from this question, I have tried doing the following (using the support vector classifier SVC as an example)
from sklearn.svm import SVC
from docutils.core import publish_string
print(publish_string(SVC.__doc__, writer_name='html'))
For anyone who requires it, the raw docstring is
"C-Support Vector Classification.\n\n The implementations is a based on libsvm. The fit time complexity\n is more than quadratic with the number of samples which makes it hard\n to scale to dataset with more than a couple of 10000 samples.\n\n The multiclass support is handled according to a one-vs-one scheme.\n\n For details on the precise mathematical formulation of the provided\n kernel functions and how `gamma`, `coef0` and `degree` affect each,\n see the corresponding section in the narrative documentation:\n :ref:`svm_kernels`.\n\n .. The narrative documentation is available at http://scikit-learn.org/\n\n Parameters\n ----------\n C : float, optional (default=1.0)\n Penalty parameter C of the error term.\n\n kernel : string, optional (default='rbf')\n Specifies the kernel type to be used in the algorithm.\n It must be one of 'linear', 'poly', 'rbf', 'sigmoid', 'precomputed' or\n a callable.\n If none is given, 'rbf' will be used. If a callable is given it is\n used to precompute the kernel matrix.\n\n degree : int, optional (default=3)\n Degree of the polynomial kernel function ('poly').\n Ignored by all other kernels.\n\n gamma : float, optional (default=0.0)\n Kernel coefficient for 'rbf', 'poly' and 'sigmoid'.\n If gamma is 0.0 then 1/n_features will be used instead.\n\n coef0 : float, optional (default=0.0)\n Independent term in kernel function.\n It is only significant in 'poly' and 'sigmoid'.\n\n probability: boolean, optional (default=False)\n Whether to enable probability estimates. This must be enabled prior\n to calling `fit`, and will slow down that method.\n\n shrinking: boolean, optional (default=True)\n Whether to use the shrinking heuristic.\n\n tol : float, optional (default=1e-3)\n Tolerance for stopping criterion.\n\n cache_size : float, optional\n Specify the size of the kernel cache (in MB)\n\n class_weight : {dict, 'auto'}, optional\n Set the parameter C of class i to class_weight[i]*C for\n SVC. If not given, all classes are supposed to have\n weight one. The 'auto' mode uses the values of y to\n automatically adjust weights inversely proportional to\n class frequencies.\n\n verbose : bool, default: False\n Enable verbose output. Note that this setting takes advantage of a\n per-process runtime setting in libsvm that, if enabled, may not work\n properly in a multithreaded context.\n\n max_iter : int, optional (default=-1)\n Hard limit on iterations within solver, or -1 for no limit.\n\n random_state : int seed, RandomState instance, or None (default)\n The seed of the pseudo random number generator to use when\n shuffling the data for probability estimation.\n\n Attributes\n ----------\n `support_` : array-like, shape = [n_SV]\n Index of support vectors.\n\n `support_vectors_` : array-like, shape = [n_SV, n_features]\n Support vectors.\n\n `n_support_` : array-like, dtype=int32, shape = [n_class]\n number of support vector for each class.\n\n `dual_coef_` : array, shape = [n_class-1, n_SV]\n Coefficients of the support vector in the decision function. For multiclass, coefficient for all 1-vs-1 classifiers. The layout of the coefficients in the multiclass case is somewhat non-trivial. See the section about multi-class classification in the SVM section of the User Guide for details.\n\n `coef_` : array, shape = [n_class-1, n_features]\n Weights asigned to the features (coefficients in the primal\n problem). This is only available in the case of linear kernel.\n\n `coef_` is readonly property derived from `dual_coef_` and\n `support_vectors_`\n\n `intercept_` : array, shape = [n_class * (n_class-1) / 2]\n Constants in decision function.\n\n Examples\n --------\n >>> import numpy as np\n >>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])\n >>> y = np.array([1, 1, 2, 2])\n >>> from sklearn.svm import SVC\n >>> clf = SVC()\n >>> clf.fit(X, y) #doctest: +NORMALIZE_WHITESPACE\n SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3,\n gamma=0.0, kernel='rbf', max_iter=-1, probability=False,\n random_state=None, shrinking=True, tol=0.001, verbose=False)\n >>> print(clf.predict([[-0.8, -1]]))\n [1]\n\n See also\n --------\n SVR\n Support Vector Machine for Regression implemented using libsvm.\n\n LinearSVC\n Scalable Linear Support Vector Machine for classification\n implemented using liblinear. Check the See also section of\n LinearSVC for more comparison element.\n\n "
However, I get a parser error
<string>:9: (ERROR/3) Unknown interpreted text role "ref".
<string>:17: (SEVERE/4) Unexpected section title.
Parameters
----------
Traceback (most recent call last):
File "<ipython-input-22-2ceadc2dc730>", line 1, in <module>
publish_string(SVC.__doc__)
File "C:\Anaconda3\lib\site-packages\docutils\core.py", line 414, in publish_string
enable_exit_status=enable_exit_status)
File "C:\Anaconda3\lib\site-packages\docutils\core.py", line 662, in publish_programmatically
output = pub.publish(enable_exit_status=enable_exit_status)
File "C:\Anaconda3\lib\site-packages\docutils\core.py", line 217, in publish
self.settings)
File "C:\Anaconda3\lib\site-packages\docutils\readers\__init__.py", line 72, in read
self.parse()
File "C:\Anaconda3\lib\site-packages\docutils\readers\__init__.py", line 78, in parse
self.parser.parse(self.input, document)
File "C:\Anaconda3\lib\site-packages\docutils\parsers\rst\__init__.py", line 172, in parse
self.statemachine.run(inputlines, document, inliner=self.inliner)
File "C:\Anaconda3\lib\site-packages\docutils\parsers\rst\states.py", line 170, in run
input_source=document['source'])
File "C:\Anaconda3\lib\site-packages\docutils\statemachine.py", line 239, in run
context, state, transitions)
File "C:\Anaconda3\lib\site-packages\docutils\statemachine.py", line 460, in check_line
return method(match, context, next_state)
File "C:\Anaconda3\lib\site-packages\docutils\parsers\rst\states.py", line 1135, in indent
elements = self.block_quote(indented, line_offset)
File "C:\Anaconda3\lib\site-packages\docutils\parsers\rst\states.py", line 1150, in block_quote
self.nested_parse(blockquote_lines, line_offset, blockquote)
File "C:\Anaconda3\lib\site-packages\docutils\parsers\rst\states.py", line 282, in nested_parse
node=node, match_titles=match_titles)
File "C:\Anaconda3\lib\site-packages\docutils\parsers\rst\states.py", line 195, in run
results = StateMachineWS.run(self, input_lines, input_offset)
File "C:\Anaconda3\lib\site-packages\docutils\statemachine.py", line 239, in run
context, state, transitions)
File "C:\Anaconda3\lib\site-packages\docutils\statemachine.py", line 460, in check_line
return method(match, context, next_state)
File "C:\Anaconda3\lib\site-packages\docutils\parsers\rst\states.py", line 2720, in underline
source=src, line=srcline)
File "C:\Anaconda3\lib\site-packages\docutils\utils\__init__.py", line 235, in severe
return self.system_message(self.SEVERE_LEVEL, *args, **kwargs)
File "C:\Anaconda3\lib\site-packages\docutils\utils\__init__.py", line 193, in system_message
raise SystemMessage(msg, level)
SystemMessage: <string>:17: (SEVERE/4) Unexpected section title.
Parameters
----------
All I really want is a way to convert docstrings of sklearn objects into HTML without having to write a full fledged parser on my own. If there is no way to do so, then any suggestions for writing the parser are welcome. Thanks in advance.