I would like to scrape Headlines of pdfs. E.g: 1 Header 1.1 Subheader 1.2 Subheader 2 2 Header 2

All these headers are formated and are bold. I know I could use regex, however the numbers are also used in the text and the Headertitles differ. I would like to scrape the Headers by using PDFMiner.

I have first tried with regex, however the titles are to diverse. No i extracted with pdfminer the Layout, however, it does not differ.

0 Answers