I tried in PyCharm with Python 3.9 to use srBERt model 'allenai/scibert_scivocab_uncased' on pdfs. having installed transformers and pdfquery, it gives me issues with the pdf tool. i tried a few different ones and non work. any recommendation which to use?
i'm a novice in coding and my code is
from pdfquery import PDFQuery
import os
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained('allenai/scibert_scivocab_uncased')
model = AutoModel.from_pretrained('allenai/scibert_scivocab_uncased')
pdf_dir = 'C:\\PhD Local Files P13 SLR Legitimacy\\srBERT\\pdfs\\'
for i in range(1, 3): # Assuming the PDFs are named 1.pdf, 2.pdf, ..., 170.pdf
pdf_path = os.path.join(pdf_dir, f'{i}.pdf')
pdf = PDFQuery(pdf_path)
pdf.load()
text_elements = pdf.pq('LTTextLineHorizontal')
text = ' '.join([t.text for t in text_elements])
inputs = tokenizer(text, return_tensors='pt')
outputs = model(**inputs)
i tried different pdf viewers incl. e.g. pypdf2, pdfreader, pdfreader.six and none worked