Scraping large and complex PDF tables

Question

Scraping large and complex PDF tables

503 views Asked by pkpto39 At 06 December 2020 at 18:40

I've been trying to scrape some data off of PDFs regarding 2020 election results in California for my own morbid curiosity.

I need to scrape many tables that appear across many pages. In some cases, the rows will continue onto the next page, and additional columns will appear on other pages as well. I've included a link to one example. I'm comfortable with R, but I can also use Python if that will be better for scraping. I haven't found many resources indicating how to deal with tables that carry onto additional pages for either language though. I need to get these tables into a CSV or XLSX format.

Thank you in advance!

In this example, Pages 15-28 should be one table. https://www.co.tehama.ca.us/images/images/Elections/StatementOfVotesCastNOV2020v2excel.pdf

Original Q&A

There are 1 answers

**G5W** · Answer 1 · 2020-12-06T19:30:21+00:00

I was able to get the entire table using the following procedure.

Open the pdf in MS Word - not Adobe Acrobat. Word will convert the document.
After the conversion has completed, select all. (Both may take some time.)
Paste into a blank Excel worksheet. Save and enjoy.

TechQA.

Scraping large and complex PDF tables

There are 1 answers

Related Questions in PYTHON

Related Questions in R

Related Questions in EXCEL

Related Questions in PDF-SCRAPING

Popular Questions

Popular Tags

Trending Questions