I build a script using textract, which reads the content of pdf files. Which contains the following function:
import textract
import tempfile
def read_file(bytes):
with tempfile.NamedTemporaryFile('wb', delete=True) as temp:
temp.write(bytes)
temp.flush()
context = textract.process(temp.name, encoding='utf-8',extension=".pdf")
return context.decode('utf-8')
This script works locally, but when deployed on a function app, but it does not. This is the error message it returns:
pdf2txt.py /tmp/tmpe3yo9gax` failed because the executable
`pdf2txt.py` is not installed on your system. Please make
sure the appropriate dependencies are installed before using
textract:
http://textract.readthedocs.org/en/latest/installation.html
Both textract and pdf2text are in the requirements.txt of the function app, so it should be installed on deployment. Anyone has an idea why this does not work? It seems like the library pdf2text refuses to install via pip on the function app.
Create one HttpTrigger Function with the code below to extract text from pdf file with textract and PyPDF2
My function_app.py:-
My requirements.txt:-
My Function Folder with PDF File:-
Deployed this Function successfully:-
When I triggered the url, I received the text from my pdf file:-