Accessing PDF table

Question

Accessing PDF table

505 views Asked by Effe Pelosa At 14 December 2016 at 09:47

I'm parsing PDFs with pdfMiner, using it as a library in my python script.

In most of these PDFs there is a table, where one of the columns is named "company".

Is there a way to:

detect the existence of that table in the PDF.
get all the company names (i.e. all the entries in the 2nd column of the table).

Original Q&A

There are 1 answers

**Effe Pelosa** · Answer 1 · 2016-12-14T15:32:54+00:00

Effe Pelosa On 14 December 2016 at 15:32

The best method I found so far is to use the HTMLconverter class in the pdfminer lib. This allows you to convert the pdf in HTML format, and it is easier to figure out tables, rows and columns. In my case at least: it may work with all kinds of tables in a PDF file.

TechQA.

Accessing PDF table

There are 1 answers

Related Questions in PYTHON

Related Questions in PARSING

Related Questions in PDF

Related Questions in PDFMINER

Popular Questions

Popular Tags

Trending Questions