How to make PDFMiner Six detect bullet points (including alphanumeric bullets) when parsing documents?

118 views Asked by At

I am currently using PDFMiner.six to parse documents for me, but would like it to be able to detect bullet points (including alphanumeric bullets like "a.", "i.", "1."). For now, it only treats them as characters, but I was wondering if I am missing some functionality in PDFMiner.six that would allow me to more easily determine when a bullet or group of bullets appears.

I'm currently under the assumption that I'm just going to have to write an algorithm to find these special characters and substrings at the start of each block, but am fairly new to this and would love to be pleasantly surprised.

0

There are 0 answers