Tabula not reading my pdf/all data comes in as blank

37 views Asked by At

I'm trying to take this pdf: https://www.occ.gov/topics/charters-and-licensing/weekly-bulletin/2023/wb-11052023-11112023.pdf and export as a csv with the columns "ACTION", "DATE", "BANK NAME", "LOCATION", "CITY", "STATE"

My code is as follows:

import tabula
import pandas as pd


pdf_path = '*pdf file path*'

# Read PDF into a list of DataFrame
dfs = tabula.read_pdf(pdf_path, pages='2', multiple_tables=True)

# Concatenate DataFrames into a single DataFrame
df = pd.concat(dfs)

# Specify the columns to keep
columns_to_keep = ["ACTION", "DATE", "TYPE",  "BANK NAME", "LOCATION", "CITY", "STATE"]

# Select only the relevant columns
df = df[columns_to_keep]

# Drop rows with all NaN values
#df = df.dropna(how='all')

# Write the DataFrame to a CSV file
df.to_csv("output.csv", index=False)

print("CSV file generated successfully.")

This is able to generate the headers fine for my csv, but the data comes in empty. Anyone have experience with this? Right now, just testing with page 2 but would ideally want the whole pdf.

Tried tabula functions but output is blank

0

There are 0 answers