Camelot Cannot extract entire table

Question

Camelot Cannot extract entire table

506 views Asked by Douglas Griffin At 26 June 2021 at 14:58

Im using Camelot to extract table information from a PDF that i have converted from scanned to searchable using ocrmypdf(500dpi).

Camelot seems to be able to identify the table and extract most of the data within the table but it seems to be unable to extract the bottom half. In essence, it sees the top half of the table but seems to be unable to separate the text from the lower half.

This is the table from the PDF in question:

But when i use the visual debugging method of Camelot where i ask it to show me the words it will extract it seems to recognize the bottom section of the table as one giant block

Any guidance you can provide on improving Camelots "vision" here would be helpful.

Original Q&A

There are 1 answers

**Tomper** · Answer 1 · 2021-10-26T10:20:00+00:00

Apart from the block, the horizontal lines are also marked as text, which is odd.

Camelot uses pdfminer.six for text extraction and you can pass LAParams (page 16) to camelot.read_pdf() to tweak that.
You should also check out camelot.plot(table, type="grid") to see if the lines are recognized correctly. If not, that might be where the problem lies.

TechQA.

Camelot Cannot extract entire table

There are 1 answers

Related Questions in PYTHON

Related Questions in PDF-EXTRACTION

Related Questions in PYTHON-CAMELOT

Related Questions in PDFTABLES

Related Questions in OCRMYPDF

Popular Questions

Popular Tags

Trending Questions