Replace Markdown heading tags with custom in Python Markdown

1.8k views Asked by At

We want to replace the default h tags, introduced by markdown using #, with a custom HTML Tag. For Parsing Markdown to HTML we use the Python Library Markdown.

We have tried to register an extension that uses a H1 regex. This extension uses the regexp (#) (.*) for detecting H1 Elements.

import markdown
from markdown.extensions import Extension
from markdown.inlinepatterns import SimpleTagPattern

class CustomHeadings(Extension):
    def extendMarkdown(self, md, md_globals):
        H1_RE = r'(#) (.*)'

        h1_tag = SimpleTagPattern(H1_RE, 'span class="h1"')
        md.inlinePatterns['h1'] = h1_tag

md_extensions = [CustomHeadings()]

# [...]

def ds_custom_markdown_parse(value):
    return markdown.markdown(value, extensions=md_extensions)

We want to have h{1-6} elements as a span class="h{1-6}". But the Markdown parser still matches the string # This is a h1 to <h1>This is a h1</h1>. We expect the output to be <span class="h1">This is a h1</span>

1

There are 1 answers

0
Waylan On BEST ANSWER

Headings are block-level elements and therefore are not parsed by inlinePatterns. Prior to running the inlinePatterns, Python-Markdown runs the BlockParser, which converts all of the block-level elements of the document into an ElementTree object. Each block-level element is then passed through the inlinePatterns one at a time and the the span-level elements are parsed.

For example, given your heading # This is a h1, the BlockParser has already converted it to an H tag <h1>This is a h1</h1> and the inlinePatterns only see the text content of that tag This is a h1.

You have a few options for addressing this:

  1. You could override the BlockProcessors which parse headings so that they create the elements you desire from the get-go.
  2. Or you could leave the existing block parser in place and create a TreeProcessor which steps through the completed ElementTree object and alters the elements by redefining the tag names in the relevant elements.

Option 2 should be much simpler and is, in fact, the method used by a few existing extensions.

Full discloser: I am the lead developer of the Python-Markdown project.