What are known limitations of borb related to PDF versions?

Question

What are known limitations of borb related to PDF versions?

380 views Asked by gtoffoli At 07 January 2025 at 18:07

I'm new to borb, which seems to me a very promising Python package.

Trying to load a small sample of PDF documents, just to put hands on, I've found that borb can open some of them without problems; in some cases I got messages such as "Unable to process XMP meta-data"; yet in other cases I got assertion errors.

Thus, before posting specific issues, I'm looking for information about current limitations of borb, with reference to PDF versions, and on tools I could use first to detect files to be considered invalid PDF documents. Thanks.

I'm using borb release v2.0.20, just cloned from GitHub, and Python 3.6.5 on Windows 10.

Original Q&A

There are 1 answers

**Joris Schellekens** · Answer 1 · 2022-03-05T14:49:11+00:00

Disclaimer: I am Joris Schellekens, author of the aforementioned library borb.

The problem is that the PDF spec (ISO-32000) leaves some room for interpretation at various points throughout. That means some PDF libraries will interpret the spec in a given way, and produce documents that may not always be compliant according to other tools.

borb tends to be very strict when it comes to PDF parsing. As soon as an error is detected, it will throw the stacktrace right back at you. Whereas other PDF software (e.g. Adobe Reader) tend to be much more forgiving in terms of what they accept as input PDF documents.

Although I certainly understand your frustration at being unable to process what you perceive to be "perfectly good PDF documents", I assure you that processing them might lead to even more issues.

I know for instance that there are cases where Adobe Reader tries to correct a bad PDF document, and as a result ends up corrupting the signatures in the document (very undesirable).

If you experience issues, and you can share the PDF, feel free to log a ticket on the GitHub repository.

From the top of my head, the current limitations of borb are:

signatures
encrypted PDF documents
XREF not found
some images with transparent pixels

TechQA.

What are known limitations of borb related to PDF versions?

There are 1 answers

Related Questions in PYTHON-3.X

Related Questions in PDF

Related Questions in BORB

Popular Questions

Popular Tags

Trending Questions