Check for existence of an OCR table without using the read_pdf function?

38 views Asked by At

Currently using camelot to populate a dictionary like so:

tables = camelot.read_pdf(temp_file_path)
tables_dict = {}

if tables.n > 0:
    for i, table in enumerate(tables, start=1):
        table_key = f"Table{i}"
        df = table.df
        table_rows = []

        for index, row in df.iterrows():
            row_dict = row.to_dict()
            row_dict = {str(key): value for key, value in row_dict.items()}
            table_rows.append(row_dict)

        tables_dict[table_key] = table_rows

This block runs on every page of the PDFs I use, which are usually around 1,000 pages. Only a few dozen of them include tables. Is there a way to check for the existence of a table without using the read_pdf function? The invocation on pure text pages slows down the application considerably.

0

There are 0 answers