How to Concatenate PDFs via Pikepdf and Python without Unnecessary Disk Read-Write?

Question

How to Concatenate PDFs via Pikepdf and Python without Unnecessary Disk Read-Write?

235 views Asked by Della At 18 April 2023 at 03:36

Current technology stack

img2pdf==0.4.4
pikepdf==7.1.2
Python 3.10
Ubuntu 22.04

The requirement

A pdf file (let's call it static.pdf) exists in the disk. Another pdf (let's call it dynamic.pdf) is being generated dynamically in memory with img2pdf library, depending on some user input parameters.

The task is to concatenate these two pdfs as a single one (static.pdf, then dynamic.pdf) and send it as an email attachment via the SMTP library.

Current Solution I am Employing

This is based on the pikepdf documentation.

Dump dynamic.pdf in the disk
Read static.pdf from disk with pikepdf
Read dynamic.pdf from disk with pikepdf
Concatenate them with list-like API provided by pikepdf, let's call this final.pdf.
Dump final.pdf on disk with pikepdf api
Read it from disk with open(file='final.pdf', mode='rb') as bytes
Attach the bytes to the email message

What I want

Remove all the unnecessary disk-I/O when I already have dynamic.pdf in memory, and the final result is needed to be attached to email as bytes (no need to persist on disk). So ideally, the only disk operation should be reading static.pdf.

But I cannot find much information on the pikepdf site about in-memory concatenation. Moreover, I am also not certain whether a pikepdf.Pdf object can expose the exact same bytes as what I would get if I dump the pdf on disk and then read it using python native open function.

So any ideas around this would be helpful, even if there are other libraries that allow this functionality. The constraints on other libraries would be

Plays will with my tech stack (python, Ubuntu and also needs to run on windows)
FOSS, and trustworthy enough

Original Q&A

There are 1 answers

**Andrew Tapia** · Answer 1 · 2023-04-18T04:51:34+00:00

According to the documentation for pikepdf.Pdf, the Pdf.open and Pdf.save methods accept a file-like object instead of a filename, so you can use io.BytesIO here.

For example,

import io
import img2pdf
from pikepdf import Pdf

def pdf_from_bytes(data):
    return Pdf.open(io.BytesIO(data))

def add_png_to_end(static_pdf_path, png_file_path):
    # Adds a PNG to the end of an existing PDF document and returns the bytes.
    static_pdf = Pdf.open(static_pdf_path)
    png_pdf = pdf_from_bytes(img2pdf.convert(png_file_path))
    new_pdf = Pdf.new()
    new_pdf.pages.extend(static_pdf.pages)
    new_pdf.pages.extend(png_pdf.pages)
    res = io.BytesIO()
    new_pdf.save(res)
    return res.getvalue()

TechQA.

How to Concatenate PDFs via Pikepdf and Python without Unnecessary Disk Read-Write?

There are 1 answers

Related Questions in PYTHON

Related Questions in PDF

Related Questions in QPDF

Related Questions in PIKEPDF

Popular Questions

Trending Questions