I'm trying to use tabula and Pandas concat
in Python to PDF bank statements into a CSV/Xlsx file, so I can automate the task of manually entering them into Excel, however after intense experimentation and testing, it refuses extract more than 1 table (only their column headers).
All the tables in the PDF have the same columns (Date, Balance, etc.), where I just want to stack them vertically and sort them by date.
Here's how I'm trying to extract and concatenate the tables from the PDF as efficiently as possible:
df = pd.concat(tabula.read_pdf(filename, pages='all', multiple_tables=True))
tabula.convert_into(filename, "Converted Document.csv", output_format="csv", pages='all')
The columns for the other months are being shown as expected, however the transactions are missing for all months except the first.
I've looked at other solutions here, but I haven't found a way to get all transactions showing in the converted Excel file.