How can I get the total count of total pages of a PDF file using PDFMiner in Python?

Question

How can I get the total count of total pages of a PDF file using PDFMiner in Python?

20.7k views Asked by Malik Anas Ahmad At 06 January 2025 at 09:22

In pypdf, I can get the total number of pages of a PDF file via:

from pypdf import PdfReader

reader = PdfReader("example.pdf")
no_of_pages = len(reader.pages)

How can I get this using PDFMiner?

Original Q&A

There are 5 answers

Martin Thoma On 04 May 2021 at 11:31

I realize you were asking for PDFMiner. However, people coming via Google Search to this question might also be interested in alternatives to PDFMiner.

PyPDF2

PyPDF2 is a pure-python alternative that recently improved a lot (e.g. text extraction / decryption):

from PyPDF2 import PdfReader

reader = PdfReader("example.pdf")
pdf_page_count = len(reader.pages)

Pike

Docs

from pikepdf import Pdf
pdf_doc = Pdf.open('fourpages.pdf')
pdf_page_count = len(pdf_doc.pages)

penduDev On 22 May 2020 at 11:58

I found PDFMiner very slow in getting the total number of pages. I found this to be a cleaner and faster solution:

pip3 install PyPDF2

from PyPDF2 import PdfFileReader

def get_pdf_page_count(path):
  with open(path, 'rb') as fl:
    reader = PdfFileReader(fl)
    return reader.getNumPages()

José Lacerda On 23 May 2020 at 21:51

Using pdfminer.six you just need to import the high level function extract_pages, convert the generator into a list and take its length.

from pdfminer.high_level import extract_pages

print(len(list(extract_pages(pdf_file))))

Mangohero1 On 23 August 2017 at 14:12

Using pdfminer,import the necessary modules.

from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfpage import PDFPage

Create a PDF parser object associated with the file object.

fp = open('your_file.pdf', 'rb')
parser = PDFParser(fp)

Create a PDF document object that stores the document structure.

document = PDFDocument(parser)

Iterate through the create_pages() function incrementing each time there is a page.

num_pages = 0
for page in PDFPage.create_pages(document):
    num_pages += 1
print(num_pages)

**Pete** · Accepted Answer · 2017-12-07T03:03:29+00:00

I hate to just leave a code snippet. For context here is a link to the current pdfminer.six repo where you might be able to learn a little more about the resolve1 method.

As you're working with PDFMiner, you might print and come across some PDFObjRef objects. Essentially you can use resolve1 to expand those objects (they're usually a dictionary).

from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfpage import PDFPage
from pdfminer.pdfinterp import resolve1

file = open('some_file.pdf', 'rb')
parser = PDFParser(file)
document = PDFDocument(parser)

# This will give you the count of pages
print(resolve1(document.catalog['Pages'])['Count'])

TechQA.

How can I get the total count of total pages of a PDF file using PDFMiner in Python?

There are 5 answers

PyPDF2

Pike

Related Questions in PYTHON

Related Questions in PDFMINER

Popular Questions

Popular Tags

Trending Questions