I've been trying to scrape some data off of PDFs regarding 2020 election results in California for my own morbid curiosity.
I need to scrape many tables that appear across many pages. In some cases, the rows will continue onto the next page, and additional columns will appear on other pages as well. I've included a link to one example. I'm comfortable with R, but I can also use Python if that will be better for scraping. I haven't found many resources indicating how to deal with tables that carry onto additional pages for either language though. I need to get these tables into a CSV or XLSX format.
Thank you in advance!
In this example, Pages 15-28 should be one table. https://www.co.tehama.ca.us/images/images/Elections/StatementOfVotesCastNOV2020v2excel.pdf
I was able to get the entire table using the following procedure.