Converting PDF bank statements to Xlsx/CSV file

121 views Asked by At

I'm trying to use tabula and Pandas concat in Python to PDF bank statements into a CSV/Xlsx file, so I can automate the task of manually entering them into Excel, however after intense experimentation and testing, it refuses extract more than 1 table (only their column headers).

All the tables in the PDF have the same columns (Date, Balance, etc.), where I just want to stack them vertically and sort them by date.

Here's how I'm trying to extract and concatenate the tables from the PDF as efficiently as possible:

df = pd.concat(tabula.read_pdf(filename, pages='all', multiple_tables=True))

tabula.convert_into(filename, "Converted Document.csv", output_format="csv", pages='all')

The columns for the other months are being shown as expected, however the transactions are missing for all months except the first.

I've looked at other solutions here, but I haven't found a way to get all transactions showing in the converted Excel file.

0

There are 0 answers