PdfPlumber reads tables vertically

104 views Asked by At

When I use PDFPlumber on table extraction, some tables are read vertically letter by letter in different cells, instead of an horizontal read inside of a cell..

The table has the next structure: enter image description here It can be noted that it is not read correctly: Image

This is the script:

import pdfplumber
import pandas as pd
from openpyxl import Workbook
pdf_file_path = "CHILQUINTA.pdf"
def read_enel_0():
    archivo = open('output.txt', 'w')
    wb = Workbook()
    with pdfplumber.open(pdf_file_path) as pdf:
        for index, page in enumerate(pdf.pages):
            tables = page.extract_tables()
            nombre_hoja = 'Hoja' + str(index)
            ws = wb.create_sheet(title=nombre_hoja)
            for table in tables:
                for elemento in table:
                #df_temp = pd.DataFrame(rows)
                #data_list = df_temp.values.tolist()
                #for row_data in data_list:
if __name__ == "__main__":

Is there some argument that it could be useful to correct this problem? , using PDFPlumber ideally.

PD: Tabula reads better the table, but I think I am omiting some functionality of PDFPlumber..

PD2: Example of Table: https://a.storyblok.com/f/82872/x/06a39e751a/suministro_chilquinta_202307.pdf


There are 0 answers