I am trying to extract tabular data from text-based pdfs. PDFs are of different formats and I have to make a generalised solution. I came across one library named "pdftabextract" for this task. But, it works on scanned documents and has been designed for the same.
I want to use it for my text-based pdfs, but don't know how to do it.
Article Link : https://datascience.blog.wzb.eu/2017/02/16/data-mining-ocr-pdfs-using-pdftabextract-to-liberate-tabular-data-from-scanned-documents/
The above article shows step by step approach. But, I don't know how to use that for text-based pdfs. Please help.