Python : Merging PDF files from list of files with full path : input1.pdf ; input2.pdf ; output.pdf

273 views Asked by At

I would like to get a script able to to merge many PDF (but 2 by 2) into a single pdf from a list (XLS, TXT, CSV...) like this one :

path1/input_file1.pdf;path2/input_file2.pdf;path3/output_file1.pdf
...
...
...
path100/input_file100.pdf;path200/input_file200.pdf;path300/output_file100.pdf

With this list I would like 100 pdf files in output : input_file1.pdf + input_file2.pdf = output_file1.pdf ... input_file100.pdf + input_file200.pdf = output_file100.pdf

Option : it would be great if it's possible to sometimes have only one PDF in input (in this case output = input) :

path1/input_file1.pdf;;path3/output_file1.pdf

or

;path2/input_file2.pdf;path3/output_file1.pdf

I think using Python can be a good way to do that, maybe with pdfrw library

Thanks for your help,

Maxence

@ @Patrick Maupin : do you have any idea ?

1

There are 1 answers

2
Jorj McKie On

A PyMuPDF solution:

import fitz # PyMuPDF
doc = fitz.open()  # output PDF
pdflist = [...]  # your list of PDF files

for fpath in pdflist:
    src = fitz.open(fpath)
    doc.insert_pdf(src)
    src.close()

doc.save("output.pdf")

If you need to join non-PDF files, you must first open the file as a document, then convert it to a PDF, and then join it. There are about a dozen of other filetypes supported as a PyMuPDF-Document, but XLS, CSV, DOCX, TXT are not among them: those must be converted to PDF first by some other means.

Among supported document types are XPS, EPUB, MOBI plus about 10 image formats. Here is the adjusted code:

import fitz
pathlist = [...]  # names of supported document files
doc = fitz.open()  # output PDF
for path in pathlist:

    if not path.endswith(".pdf"):
        srcdoc = fitz.open(path)
        pdfbytes = srcdoc.convert_to_pdf()
        srcdoc.close()
        src = fitz.open("pdf", pdfbytes)
    else:
        src = fitz.open(path)

    doc.insert_pdf(src)
    src.close()

doc.save("output.pdf")

PS: I do not think you will find a faster package for this task ...