Is it possible to remove unused footnote references using Pandoc?

151 views Asked by At

If wonder if it is possible to remove unused footnote references from a Markdown document using Pandoc using a built-in feature or a custom filter?

2

There are 2 answers

1
TristanMas On

You'll need to create a custom Lua filter I think. Pandoc doesn't have a built-in feature for this.

You can approach like this

  1. Lua Script: Create a Lua script that Pandoc can use It should check all footnote references in your document

  2. Filter Logic: The script should identify which footnotes are referenced in the text and remove all that aren't used

  3. Run Pandoc with the Lua Filter: Use Pandoc to process your Markdown file with this Lua script as a filter.

Main step is the Lua scripting.

0
Jerome WAGNER On

There can be two problems in your question :

dangling notes

This is the case when a footnote is created but there is no reference to it in the document content.

I tried

Here is a footnote reference,[^1].

[^1]: Here is the footnote.

[^2]: Here is the unused footnote.

on https://pandoc.org/try/ with a configuration set to "markdown to markdown" - pandoc --from markdown --to markdown

The result is

Here is a footnote reference,[^1].

[^1]: Here is the footnote.

with a warning "Note with key '2' defined at line 5 column 1 but not used."

so maybe you could do this "markdown to markdown" conversion first, or look at how the "markdown to markdown" conversion manages to remove the unused note.

In the pandoc source code, you can find

src/Text/Pandoc/Readers/Markdown.hs#L332

-- check for notes with no corresponding note references
checkNotes :: PandocMonad m => MarkdownParser m ()
checkNotes = do
  st <- getState
  let notesUsed = stateNoteRefs st
  let notesDefined = M.keys (stateNotes' st)
  mapM_ (\n -> unless (n `Set.member` notesUsed) $
                case M.lookup n (stateNotes' st) of
                   Just (pos, _) -> report (NoteDefinedButNotUsed n pos)
                   Nothing -> throwError $
                     PandocShouldNeverHappenError "note not found")
         notesDefined

so I suspect that this is not your issue because pandoc seems to automatically handle this scenario.

The other scenario is the

dangling reference

When you have a markdown document with a lot of footnotes at the bottom, you may want to remove the footnotes but then the references are still in the document.

I tried

Here is a footnote reference.[^1]

in pandoc for a markdown to markdown conversion and I get

Here is a footnote reference.\[\^1\]

so it seems like the footnote is converted to text.

In fact, there is no difference between a note and a note reference in the pandoc AST so it is a bit difficult to handle this case.

The only way I can see would be to create a new MarkdownCustom parser that would patch the current Markdown Reader.

src/Text/Pandoc/Readers/Markdown.hs#L2029-L2051

note :: PandocMonad m => MarkdownParser m (F Inlines)
note = try $ do
  guardEnabled Ext_footnotes
  ref <- noteMarker
  updateState $ \st -> st{ stateNoteRefs = Set.insert ref (stateNoteRefs st)
                         , stateNoteNumber = stateNoteNumber st + 1 }
  noteNum <- stateNoteNumber <$> getState
  return $ do
    notes <- asksF stateNotes'
    case M.lookup ref notes of
        Nothing       -> return $ B.str $ "[^" <> ref <> "]"
        Just (_pos, contents) -> do
          st <- askF
          -- process the note in a context that doesn't resolve
          -- notes, to avoid infinite looping with notes inside
          -- notes:
          let contents' = runF contents st{ stateNotes' = M.empty }
          let addCitationNoteNum c@Citation{} =
                c{ citationNoteNum = noteNum }
          let adjustCite (Cite cs ils) =
                Cite (map addCitationNoteNum cs) ils
              adjustCite x = x
          return $ B.note $ walk adjustCite contents'

where you would need to replace return $ B.str $ "[^" <> ref <> "]" by return $ B.str $ ""

I don't think you can currently do this with a built-in feature (except implementing a new Reader) or a custom filter.

You could probably add a new option to the pandoc Markdown Reader that would enable this no-op replacement as a configuration option.

note: this is purely a thought experiment and not tested.