python: html string to pdf via Pdfkit: avoid image to span into 2 pages

411 views Asked by At

I want to output html string to pdf via Pdfkit and python. The html string included an image. The problem is that the image spanned into 2 pages as shown below.

enter image description here

Assume the image can be held in one page,

  • how to make the image not span into 2 pages via Pdfkit and python?
  • Or if Pdfkit can't do it, any other methods?

The source is html string. Therefore, I can't calculate if the space left in one page can hold the size of the image. Any idea? Thank you.

Followed is the code. qqaa is a base64 image data. If I included the data, it would break the limit 30000 of stackoverflow. So, the code below wouldn't run. I didn't know how I can attach the python script.

import pdfkit

html_str = ''

for i in range(1,15):
    html_str += '<p>many row</p>'

html_str += '<h3>3.1.12 Draw</h3><h4>3.1.13.1 3D</h4><img src="data:image/png;base64,qqaa" alt="3D Structure">'

opt = {'encoding': 'UTF-8', 'orientation': 'Landscape', 'margin-top': '0.5in', 'margin-bottom': '0.5in', 'margin-left': '0.75in', 'margin-right': '0.75in', 'outline-depth': 6, 'header-center': 'whatever', 'header-right': 'Page: [page]/[toPage]', 'header-line': '', 'header-spacing': 2, 'footer-right': 'Date: [date]', 'footer-line': '', 'footer-spacing': 2, 'enable-local-file-access': None}
pdfkit.from_string(html_str, 'out.pdf', options=opt)

pdfkit.from_string(html_str, 'out.pdf', options=opt)

Edit: one solution is to put <P style="page-break-before: always"> directly in the html string.

<P style="page-break-before: always"><img src="data:image/png;base64,qqaa" alt="3D Structure">

enter image description here

1

There are 1 answers

4
Tranbi On

You can try adding the css option page-break-inside: avoid for img elements.

Edit: create a file img.css with the following:

img {
  page-break-inside: avoid !important;
}

And pass the file path to pdfkit.from_string:

import pdfkit

html_str = ''

for i in range(1,15):
    html_str += '<p>many row</p>'

html_str += '<h3>3.1.12 Draw</h3><h4>3.1.13.1 3D</h4><img src="data:image/png;base64,qqaa" alt="3D Structure">'

opt = {'encoding': 'UTF-8', 'orientation': 'Landscape', 'margin-top': '0.5in', 'margin-bottom': '0.5in', 'margin-left': '0.75in', 'margin-right': '0.75in', 'outline-depth': 6, 'header-center': 'whatever', 'header-right': 'Page: [page]/[toPage]', 'header-line': '', 'header-spacing': 2, 'footer-right': 'Date: [date]', 'footer-line': '', 'footer-spacing': 2, 'enable-local-file-access': None}

pdfkit.from_string(html_str, 'out.pdf', options=opt, css='img.css')

Note: it seems that the property is now being replace by break-inside