Can Python-Markdown support imageboard-style links?

95 views Asked by At

I would like to add an additional syntax to Python-Markdown: if n is a positive integer, >>n should expand into <a href="#post-n">n</a>. (Double angled brackets (>>) is a conventional syntax for creating links in imageboard forums.)

By default, Python-Markdown expands >>n into nested blockquotes: <blockquote><blockquote>n</blockquote></blockquote>. Is there a way create links out of >>n, while preserving the rest of blockquote's default behavior? In other words, if x is a positive integer, >>x should expand into a link, but if x is not a positive integer, >>x should still expand into nested blockquotes.

I have read the relevant wiki article: Tutorial 1 Writing Extensions for Python Markdown. Based on what I learned in the wiki, I wrote a custom extension:

import markdown
import xml.etree.ElementTree as ET
from markdown.extensions import Extension
from markdown.inlinepatterns import Pattern


class ImageboardLinkPattern(Pattern):
    def handleMatch(self, match):
        number = match.group('number')
        # Create link.
        element = ET.Element('a', attrib={'href': f'#post-{number}'})
        element.text = f'>>{number}'
        return element


class ImageboardLinkExtension(Extension):
    def extendMarkdown(self, md):
        IMAGEBOARD_LINK_RE = '>>(?P<number>[1-9][0-9]*)'
        imageboard_link = ImageboardLinkPattern(IMAGEBOARD_LINK_RE)
        md.inlinePatterns['imageboard_link'] = imageboard_link


html = markdown.markdown('>>123',
                         extensions=[ImageboardLinkExtension()])
print(html)

However, >>123 still produces <blockquote><blockquote>123</blockquote></blockquote>. What is wrong with the implementation above?

1

There are 1 answers

0
Waylan On BEST ANSWER

The problem is that your new syntax conflicts with the preexisting blockquote syntax. Your extension would presumably work if it was ever called. However, due to the conflict, that never happens. Note that their are five types of processors. As documented:

  • Preprocessors alter the source before it is passed to the parser.
  • Block Processors work with blocks of text separated by blank lines.
  • Tree Processors modify the constructed ElementTree
  • Inline Processors are common tree processors for inline elements, such as *strong*.
  • Postprocessors munge of the output of the parser just before it is returned.

Of importance here is that the processors are run in that order. In other words, all block processors are run before any inline processors are run. Therefore, the blockquote block processor runs first on your input and removes the double angle bracket, wrapping the rest of the line in double blockquote tags. By the time your inline processor sees the document, your regex will no longer match and will therefore never be called.

That being said, an inline processor is the correct way to implement a link syntax. However, you would need to do one of two things to make it work.

  1. Alter the syntax so that it does not clash with any preexisting syntax; or
  2. Alter the blockquote behavior to avoid the conflict.

Personally, I would recommend option 1, but I understand you are trying to implement a preexisting syntax from another environment. So, if you want to explore option 2, then I would suggest perhaps making the blockquote syntax a little more strict. For example, while it is not required, the recommended syntax is to always insert a space after the angle bracket in a blockquote. It should be relatively simple to alter the BlockquoteProcessor to require the space, which would cause your syntax to no longer clash.

This is actually pretty simple. As you may note, the entire syntax is defined via a rather simple regex:

RE = re.compile(r'(^|\n)[ ]{0,3}>[ ]?(.*)')

You simply need to rewrite that so that 0 whitespace is no longer accepted (> rather than >[ ]?). First import and subclass the existing processor and then override the regex:

from markdown.blockprocessors import BlockquoteProcessor

class CustomBlockquoteProcessor(BlockquoteProcessor):
    RE = re.compile(r'(^|\n)[ ]{0,3}> (.*)')

Finally, you just need to tell Markdown to use your custom class rather than the default. Add the following to the extendMarkdown method of your ImageboardLinkExtension class:

md.parser.blockprocessors.register(CustomBlockQuoteProcessor(md.parser), 'quote', 20)

Now the blockquote syntax will no longer clash with your link syntax and you will get an opportunity to have your code run on the text. Just be careful to remember to always include the now required space for any actual blockquotes.