PDF Tabular Data Extraction using pdftabextract

267 views Asked by At

I am trying to extract tabular data from text-based pdfs. PDFs are of different formats and I have to make a generalised solution. I came across one library named "pdftabextract" for this task. But, it works on scanned documents and has been designed for the same.

I want to use it for my text-based pdfs, but don't know how to do it.

Article Link : https://datascience.blog.wzb.eu/2017/02/16/data-mining-ocr-pdfs-using-pdftabextract-to-liberate-tabular-data-from-scanned-documents/

The above article shows step by step approach. But, I don't know how to use that for text-based pdfs. Please help.

0

There are 0 answers