Tabula not reading my pdf/all data comes in as blank

32 views Asked by Analyst4 At 17 November 2023 at 16:58

I'm trying to take this pdf: https://www.occ.gov/topics/charters-and-licensing/weekly-bulletin/2023/wb-11052023-11112023.pdf and export as a csv with the columns "ACTION", "DATE", "BANK NAME", "LOCATION", "CITY", "STATE"

My code is as follows:

import tabula
import pandas as pd


pdf_path = '*pdf file path*'

# Read PDF into a list of DataFrame
dfs = tabula.read_pdf(pdf_path, pages='2', multiple_tables=True)

# Concatenate DataFrames into a single DataFrame
df = pd.concat(dfs)

# Specify the columns to keep
columns_to_keep = ["ACTION", "DATE", "TYPE",  "BANK NAME", "LOCATION", "CITY", "STATE"]

# Select only the relevant columns
df = df[columns_to_keep]

# Drop rows with all NaN values
#df = df.dropna(how='all')

# Write the DataFrame to a CSV file
df.to_csv("output.csv", index=False)

print("CSV file generated successfully.")

This is able to generate the headers fine for my csv, but the data comes in empty. Anyone have experience with this? Right now, just testing with page 2 but would ideally want the whole pdf.

Tried tabula functions but output is blank

Original Q&A

TechQA.

Tabula not reading my pdf/all data comes in as blank

There are 0 answers

Related Questions in PYTHON

Related Questions in PDF

Related Questions in TABULA

Popular Questions

Popular Tags

Trending Questions