I'm writing a small program in Python that simply splits a PDF into chunks with a given number of pages. The splitting itself works perfectly, but the Web Accessibility, aka. Web Content Accessibility Guidelines (WCAG), is lost.
How can I preserve the formatting necessary to keep the output files web accessible?
First I tried using PyPDF2, and got the splitting to work. However, I turned to a more advanced library. PyMuPDF should apparently be good at this type of thing.
Here is a simplified version of my method (I excluded a lot of irrelevant code for the sake of my question):
import os
import fitz # PyMuPDF
def split_pdf():
file_path = "path_to_pdf"
chunk_size = "pages_per_chunk"
output_folder = f"{file_path[:-4]}_output"
pdf_document = fitz.open(file_path)
total_pages = pdf_document.page_count
for start in range(0, total_pages, chunk_size):
end = min(start + chunk_size, total_pages)
output_filename = f"{output_folder}/pages_{start+1}_to_{end}.pdf"
doc_subset = fitz.open() # Create a new PDF to hold the subset
for page_num in range(start, end):
doc_subset.insert_pdf(pdf_document, from_page=page_num, to_page=page_num)
doc_subset.save(output_filename)
doc_subset.close()
pdf_document.close()
Any help would be greatly appreciated!