PyMuPDF: Page break functionality, respecting links across multiple pages

122 views Asked by At

Context

I'm making a program to transform Markdown to HTML to PDF, but with an extra functionality, page breaks.

I have managed to get the page break functionality working, see relevant part of the code:

...
def rectfn(rect_num, filled):
    rectangle = fitz.paper_rect("A4") + (36, 36, -36, -36)
    return fitz.paper_rect("A4"), rectangle, None

markdown_parser = markdown.markdown
pdf_document = fitz.open()

content_parts = re.split(r'<pagebreak>', self.content)

for part in content_parts:
    html_content = markdown_parser(part, extensions=list(self.plugins))
    story = fitz.Story(html=html_content, user_css=self.custom_css, archive=".")

    document = story.write_with_links(rectfn)
    pdf_document.insert_pdf(document)

pdf_document.save(file_path)
pdf_document.close()
...

However, this has broken the clickable link functionality for the final PDF document, for links that span across multiple pages.

This is because when story.write_with_links() is called for each story or page, if that page contains links to a different story or page, it isn't able to find their destination and fails with:

RuntimeError: No destination with id=quotes, required by position_from...

Extra info in case it helps

I was able to make links work across the entire document between multiple pages, but sacrificing the "page break" functionality, by just writing the entire HTML document to PDF with a single story just like this:

story = fitz.Story(html=html_content,
                   user_css=self.custom_css,
                   archive=".")

document = story.write_with_links(rectfn)
document.save(file_path)
document.close()

This, as I said, breaks the functionality of page breaks, so the solution doesn't really work for what I actually want.

Question

Is there a way to mantain this page break functionality and links between pages working at the same time?

I have already read the docs and asked my question over on the PyMuPDF discussion page, but I'm still looking for an answer.

0

There are 0 answers