How to merge huge PDF files without loading them fully in memory with itext 7?

2.6k views Asked by At

I try to merge 2 big PDF files without loading them fully in memory.

I tried with a PdfMerger and manually without a PdfMerger thanks to this kind of code :

using(var writer = new PdfWriter(new FileStream(@"C:\Test\OutBig.pdf",FileMode.OpenOrCreate)))
    using (var outputDocument = new PdfDocument(writer)) {
        using (var inputDoc = new PdfDocument(new PdfReader((@"C:\Test\InBig.pdf")))) {
            for (int i = 1; i <= inputDoc.GetNumberOfPages(); i++) {
                var newp = outputDocument.AddNewPage();
                var canvas = new PdfCanvas(newp);
                var origPage = inputDoc.GetPage(i);
                var copy = origPage.CopyAsFormXObject(outputDocument);
                canvas.AddXObject(copy, 0, 0);
                copy.Flush();
                origPage = null;
                canvas.Release();
                newp.Flush();
                writer.Flush();
                canvas = null;
                newp = null;
            }
        }

The code is working but every page is loaded in memory and stay loaded, and I consequently have more than 1GB loaded in memory.

Do you know any way to merge 2 pdfs files without loading them in memory with itext7 ?

Regards,

Patrice

2

There are 2 answers

1
Jon Coello On

I have experimented with a few components now (Aspose, ITextSharp and Telerik) and Telerik seemed to have cracked it.

I followed these steps and the memory remained low.

Sample Code

var files = Directory.GetFiles(bundlePath);

using (PdfStreamWriter fileWriter = new PdfStreamWriter(File.OpenWrite(outputFile)))
{
    // Iterate through the files you would like to merge
    foreach (string documentName in files)
    {
        // Open each of the files
        using (PdfFileSource fileToMerge = new PdfFileSource(File.OpenRead(documentName)))
        {
            // Iterate through the pages of the current document
            foreach (PdfPageSource pageToMerge in fileToMerge.Pages)
            {
                // Append the current page to the fileWriter, which holds the result FileStream
                fileWriter.WritePage(pageToMerge);
            }
        }
    }
}

ITextSharp enter image description here

Aspose enter image description here

Telerik enter image description here

0
Alexey Subach On

There are several ways of lowering memory consumption during copying of big documents with iText7. One of them is to leverage the fact that the objects are read on demand. So you actually can copy pages from source document to the destination document in multiple batches by opening and closing source document multiple times.

Here is the code in Java which should convert to C# almost solely by replacing method names to upper case.

PdfDocument doc1 = new PdfDocument(new PdfReader(IN1));
int numOfPages = doc1.getNumberOfPages();
doc1.close();
PdfDocument outDoc = new PdfDocument(new PdfWriter(OUT));
int numOfPagesPerDocumentOpen = 10;
for (int i = 1; i <= numOfPages; ) {
    int firstPageToCopy = i;
    int lastPageToCopy = Math.min(i + numOfPagesPerDocumentOpen - 1, numOfPages);
    doc1 = new PdfDocument(new PdfReader(IN1));
    doc1.copyPagesTo(firstPageToCopy, lastPageToCopy, outDoc);
    // Flush last lastPageToCopy - firstPageToCopy + 1 pages
    for (int j = 0; j <= lastPageToCopy - firstPageToCopy; j++) {
        outDoc.getPage(outDoc.getNumberOfPages() - j).flush(true);
    }
    doc1.close();
    i = lastPageToCopy + 1;
}
outDoc.close();