I would like to scrape Headlines of pdfs. E.g: 1 Header 1.1 Subheader 1.2 Subheader 2 2 Header 2
All these headers are formated and are bold. I know I could use regex, however the numbers are also used in the text and the Headertitles differ. I would like to scrape the Headers by using PDFMiner.
I have first tried with regex, however the titles are to diverse. No i extracted with pdfminer the Layout, however, it does not differ.