Current technology stack
- img2pdf==0.4.4
- pikepdf==7.1.2
- Python 3.10
- Ubuntu 22.04
The requirement
A pdf file (let's call it static.pdf) exists in the disk. Another pdf (let's call it dynamic.pdf) is being generated dynamically in memory with img2pdf library, depending on some user input parameters.
The task is to concatenate these two pdfs as a single one (static.pdf, then dynamic.pdf) and send it as an email attachment via the SMTP library.
Current Solution I am Employing
This is based on the pikepdf documentation.
- Dump
dynamic.pdfin the disk - Read
static.pdffrom disk with pikepdf - Read
dynamic.pdffrom disk with pikepdf - Concatenate them with list-like API provided by pikepdf, let's call this
final.pdf. - Dump
final.pdfon disk with pikepdf api - Read it from disk with
open(file='final.pdf', mode='rb')as bytes - Attach the bytes to the email message
What I want
Remove all the unnecessary disk-I/O when I already have dynamic.pdf in memory, and the final result is needed to be attached to email as bytes (no need to persist on disk). So ideally, the only disk operation should be reading static.pdf.
But I cannot find much information on the pikepdf site about in-memory concatenation. Moreover, I am also not certain whether a pikepdf.Pdf object can expose the exact same bytes as what I would get if I dump the pdf on disk and then read it using python native open function.
So any ideas around this would be helpful, even if there are other libraries that allow this functionality. The constraints on other libraries would be
- Plays will with my tech stack (python, Ubuntu and also needs to run on windows)
- FOSS, and trustworthy enough
According to the documentation for
pikepdf.Pdf, thePdf.openandPdf.savemethods accept a file-like object instead of a filename, so you can useio.BytesIOhere.For example,