is there any way we can find PDF file is compressed or not?

1.3k views Asked by At

we are using ITEXTPDF to compress the PDF but the issues is we want to compress the files which are compressed before uploading into our site...if the files are uploaded without compressing we would like to leave those like that..

so to do that we need to identify is that PDF is compressed or not..am wondering is there any way we can identify PDF is compressed or not using ITEXTPDF or some other tool!!!..

i have tried to Google it but couldn't find appropriate answer..

kindly let me know if u have any idea...

thanks

2

There are 2 answers

0
mark stephens On BEST ANSWER

There are several types of compression you can get in a PDF. Data for objects can be compressed and objects can be compressed into object streams.

1
Bruno Lowagie On

I voted Mark's answer up because he's right: you won't get an answer if you're not more specific. I'll add my own answer with some extra information.

In PDF 1.0, a PDF file consisted of a mix of ASCII characters for the PDF syntax and binary code for objects such as images. A page stream would contain visible PDF operators and operands, for instance:

56.7 748.5 m
136.2 748.5 l
S

This code tells you that a line has to be drawn (S) between the coordinate (x = 56.7; y = 748.5) (because that's where the cursor is moved to with the m operator) and the coordinate (x = 136.2; y = 748.5) (because a path was constructed using the l operator that adds a line).

Starting with PDF 1.2, one could start using filters for such content streams (page content streams, form XObjects). In most cases, you'll discover a /Filter entry with value /FlateDecode in the stream dictionary. You'll hardly find any "modern" PDFs of which the contents aren't compressed.

Up until PDF 1.5, all indirect objects in a PDF document, as well as the cross-reference stream were stored in ASCII in a PDF file. Starting with PDF 1.5, specific types of objects can be stored in an objects stream. The cross-reference table can also be compressed into a stream. iText's PdfReader has a isNewXrefType() method to check if this is the case. Maybe that's what you're looking for. Maybe you have PDFs that need to be read by software that isn't able to read PDFs of this type, but... you're not telling us.

Maybe we're completely misinterpreting the question. Maybe you want to know if you're receiving an actual PDF or a zip file with a PDF. Or maybe you want to really data-mine the different filters used inside the PDF. In short: your question isn't very clear, and I hope this answer explains why you should clarify.