PDFBox PDFMergerUtility: how do I tell which sources failed?

Question

PDFBox PDFMergerUtility: how do I tell which sources failed?

1.8k views Asked by Greg Valcourt At 12 February 2016 at 23:12

So, I'm doing this:

PDFMergerUtility mergePdf = new PDFMergerUtility();

for (int i = 0; i < filePaths.size(); i++) 
    mergePdf.addSource(filePaths.get(i));

mergePdf.setDestinationFileName(tempFile.getAbsolutePath()); 
mergePdf.mergeDocuments();

Which works great until an exception is thrown on a PDF it can't parse (either corrupt PDF or something PDFBox can't handle). It doesn't happen very often.

I would like to be able to tell which source(s) it failed on, exclude them in a subsequent merge and tell the user which documents failed.

Can this be done?

UPDATE:

Here's my exception:

java.io.IOException: Error: Expected a long type at offset 591535, instead got 'E^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^UZí^KÄ@©¢^X<8d>G §ÑE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^T<84>f<96><8a>'
    at org.apache.pdfbox.pdfparser.BaseParser.readLong(BaseParser.java:1695)
    at org.apache.pdfbox.pdfparser.BaseParser.readObjectNumber(BaseParser.java:1623)
    at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:614)
    at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1220)
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1187)
    at org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:237)
    at org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:194)
    at myapp.util.DocumentImage.combinePDFs(DocumentImage.java:289)
    at myapp.webapp.download.DownloadLatestForCLO.generate(DownloadLatestForCLO.java:73)
    at myapp.webapp.download.DownloadLatestForCLO.getFileSize(DownloadLatestForCLO.java:64)
    at myapp.webapp.download.DownloadServlet.handleRequest(DownloadServlet.java:58)
    at myapp.webapp.download.DownloadServlet.doGet(DownloadServlet.java:32)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
    at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
    at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:200)
    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
    at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Original Q&A

There are 1 answers

**PaulG** · Answer 1 · 2016-02-12T23:33:53+00:00

Luckily PDFBox is Opensource so having downloaded the latest source (2.00 RC3 at the time of writing) and in the file \pdfbox-2.0.0-RC3\pdfbox\src\main\java\org\apache\pdfbox\multipdf\PDFMergerUtility.java (around line 188)

We can see that it throws this exception up from a lower level and does not catch it and add details of the file that caused the error.

Until this is fixed you will have to catch this error in your code and iterate each of the source files loading and closing them until you find the one(s) that won't be able to be processed and report this yourself.

If you are interested in fixing the problem at source (inside PDFBox) then this is the edit to make and submit to the PDFBox project team. When that fix is incorporated into a build and you upgrade to that version you can safely remove your iteration code:

        try
        {
            MemoryUsageSetting partitionedMemSetting = memUsageSetting != null ? 
                    memUsageSetting.getPartitionedCopy(sources.size()+1) :
                    MemoryUsageSetting.setupMainMemoryOnly();
            Iterator<InputStream> sit = sources.iterator();
            destination = new PDDocument(partitionedMemSetting);

            while (sit.hasNext())
            {
                sourceFile = sit.next();
                source = PDDocument.load(sourceFile, partitionedMemSetting);
                tobeclosed.add(source);
                appendDocument(destination, source);
            }
            if (destinationStream == null)
            {
                destination.save(destinationFileName);
            }
            else
            {
                destination.save(destinationStream);
            }
        }

catch (IOException e) { /* Insert code to place this in an inner exception and throw one including the named 'sourceFile' */ }

        finally
        {
            ....}

TechQA.

PDFBox PDFMergerUtility: how do I tell which sources failed?

There are 1 answers

catch (IOException e) { /* Insert code to place this in an inner exception and throw one including the named 'sourceFile' */ }

Related Questions in JAVA

Related Questions in EXCEPTION

Related Questions in PDFBOX

Related Questions in UNPARSEABLE

Popular Questions

Popular Tags

Trending Questions