Is there a sane way to produce Word or OpenOffice documents from light markup formats like Asciidoc or Markdown?

858 views Asked by At

I am currently in the process of writing a thesis at university. Both the university and my assigned supervisor require the thesis to:

  • be in MS Word document format, ie. doc or docx
  • contain some formal boilerplate at the beginning, the templates for which are only available in said format - a title page, a formal statement of the thesis' originality, etc, and which must follow those templates exactly in layout, fonts, and so on
  • be in a particular font at a particular size (Times New Roman 12pt to be exact)
  • fulfill other stylistic requirements (paragraphs must be indented with a tab at their beginning)

As a programmer, I've been spoiled by modern plaintext editors and have been avoiding WYSIWYG editors like Word or LibreOffice Writer like the plague for years - doing anything more complicated than filling out a form is torture with this kind of tool, as I always end up fighting the editor instead of focusing on the content I'm writing. Changing the style of one paragraph might change all paragraphs, adding one character or line too many might blow up the whole intricate layout and strew text over two pages where one page was, it's an extremely unintuitive and frustrating experience all around. Compressed formats like ODT or DOC are also not VCS-friendly, which is a drawback, as I'd like to be able to keep a copy in a Git repo and see readable diffs.

As my thesis is going to be about software, I will likely need inline monospace sections, links to internet sources, and code listing blocks, preferably with language-appropriate syntax highlighting. Doing these things manually in LO or Word would be extremely tedious, repetitive, and error-prone. Asciidoc seems to fit my use case perfectly, but neither ODT nor any Word-compatible format is supported as an asciidoctor output target. Markdown would also be acceptable, though it lacks the ability to manually mark page breaks.

Has anyone had experience with converting either Asciidoc or Markdown to DOCX, ODT, or a compatible format? Right now I'm seriously considering writing my own plain text/light markup → FODT (flat XML ODT) converter and then manually converting the output of that to DOCX because in all likelihood it would take me less time than learning how to use a WYSIWYG editor effectively.

On a tangent, is there a way to programmatically merge two DOCX or ODT documents? If so, I could manually fill out the boilerplate templates and then join them to the document proper.

2

There are 2 answers

0
kjhughes On

Sounds like you'd like pandoc:

Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library.

Pandoc can convert between numerous markup and word processing formats, including, but not limited to, various flavors of Markdown, HTML, LaTeX and Word docx. For the full lists of input and output formats, see the --from and --to options below. Pandoc can also produce PDF output: see creating a PDF, below.

Pandoc’s enhanced version of Markdown includes syntax for tables, definition lists, metadata blocks, footnotes, citations, math, and much more. See below under Pandoc’s Markdown.

0
Enrique Motilla On

There is a wonderful DOCX generator from a programmers perspective, which IMHO would be more suitable than Pandoc as suggested by @kjhughes, take a look at https://docx.js.org/ and the examples at github repository at https://github.com/dolanmiu/docx/tree/master/demo it has an amazing set of examples for styles, sections, images embedding and so. So I would try it better than working with a markup from a predefined tool. It might need a special type of markdown to add all the features you need.

If your final output would be a PDF then using FODT as template and use nunjucks as text placeholder the use a docker converter from https://thecodingmachine.github.io/gotenberg/#introduction which works just great and fast using a simple REST call.

Your idea of using markup to FODT is also nice, but if it is not part of your thesis It may take you extra effort, though it would be very welcome to use for someone else, please share your findings too.

Good luck with your thesis.