I try to create a Lua filter to preserve HTML comments (but not any other HTML elements).
local function starts_with(start, str)
return str:sub(1, #start) == start
end
function RawInline(el)
if starts_with('<!--', el.text) then
return el
else
return nil
end
end
return {{Inline = RawInline}}
(Based on mb21's answer here: From HTML to Markdwon: As clean Markdown markup as possible, and to preserve HTML comments.)
It doesn't currently work. What might be the problem?
pandoc -f html+raw_html from.html -o to.md -t gfm --lua-filter preserve-comments.lua
There are two small problems that prevent this filter from working. I'm listing them below and include explanations and solutions for each.
The main issue is
return {{Inline = RawInline}}. This causes theRawInlinefunction to be called for all Inline elements, such as Str, Emph, Space, etc. This is causing issues, because some elements don't have a.textattribute, and callingstarts_withwithnilas the second argument triggers an error.The solution for this is to either use
return {{RawInline = RawInline}}, or to leave the line out entirely. Both solutions are equivalent due to the way pandoc constructs filters from global functions if no explicit filter table is returned.The
RawInlinefunction does nothing, becausereturn elandreturn nildo the same thing in this case. Not returning anything from a filter function causes pandoc to keep the object unaltered. Deleting an object is possible by returning{}.To summarize, this should work:
To make ensure that no HTML at all is included in the output, we can use
gfm-raw_htmlas the output format, i.e., we disable theraw_htmlextension. This will also suppress any HTML comment, so we modify the filter to pretend that these comments are raw Markdown, which will be included verbatim.