I am trying to split 20 pages of pdf file (single) , into five respective pdf files , 1st pdf contains 1-3 pages , 2nd pdf file contains only 4th page, 3rd pdf contains 5 to 10 pages, 4th pdf contains 11-17 pages , and 5th pdf contains 18-20 page . I need the working code in python. The below mentioned code splits the entire pdf file into single pages, but I want the grouped pages..

    from PyPDF2 import PdfFileWriter, PdfFileReader
    inputpdf = PdfFileReader(open("input.pdf", "rb"))
    for i in range(inputpdf.numPages):
    j = i+1    
    output = PdfFileWriter()
    output.addPage(inputpdf.getPage(i))
    with open("page%s.pdf" % j, "wb") as outputStream:
    output.write(outputStream)

2 Answers

0
Daweo On Best Solutions

For me it looks like task for pdfrw using this example from GitHub I written following example code:

from pdfrw import PdfReader, PdfWriter
pages = PdfReader('inputfile.pdf').pages
parts = [(3,6),(7,10)]
for part in parts:
    outdata = PdfWriter(f'pages_{part[0]}_{part[1]}.pdf')
    for pagenum in range(*part):
        outdata.addpage(pages[pagenum-1])
    outdata.write()

This one create two files: pages_3_6.pdf and pages_7_10.pdf each with 3 pages i.e. 3,4,5 and 7,8,9. Note pagenum-1 in code, that -1 is used due to fact that pdf pages numeration starts at 1 rather than 0. I also used so-called f-strings to get names of output files. In my opinion it is slick method but it is not available in Python2 and I am not sure if it is available in all Python3 versions (I tested my code in 3.6.7), so you might use old formatting method instead if you wish. Remember to alter filenames and ranges accordingly to your needs.

0
Community On

if you have python 3, you can use tika according to the following answer here:

How to extract text from a PDF file?