srBERt model 'allenai/scibert_scivocab_uncased' pdf reader issue

27 views Asked by At

I tried in PyCharm with Python 3.9 to use srBERt model 'allenai/scibert_scivocab_uncased' on pdfs. having installed transformers and pdfquery, it gives me issues with the pdf tool. i tried a few different ones and non work. any recommendation which to use?

i'm a novice in coding and my code is

from pdfquery import PDFQuery
import os
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained('allenai/scibert_scivocab_uncased')
model = AutoModel.from_pretrained('allenai/scibert_scivocab_uncased')

pdf_dir = 'C:\\PhD Local Files P13 SLR Legitimacy\\srBERT\\pdfs\\'

for i in range(1, 3):  # Assuming the PDFs are named 1.pdf, 2.pdf, ..., 170.pdf
    pdf_path = os.path.join(pdf_dir, f'{i}.pdf')
    pdf = PDFQuery(pdf_path)
    pdf.load()
    text_elements = pdf.pq('LTTextLineHorizontal')
    text = ' '.join([t.text for t in text_elements])
    inputs = tokenizer(text, return_tensors='pt')
    outputs = model(**inputs)

i tried different pdf viewers incl. e.g. pypdf2, pdfreader, pdfreader.six and none worked

0

There are 0 answers