I am trying to use the pdf2image library, specifically the convert_from_bytes method to convert a pdf to a txt file using pytesseract. My app runs locally, but I want to deploy the app to heroku. I have tried adding python-poppler to my pipfile, but it during deployment it fails to download. I have been trying to use the buildpack https://github.com/survantjames/heroku-buildpack-poppler.git, however when I try to use the app I get this error in the logs.
2021-02-24T02:02:07.068105+00:00 app[web.1]: pages = convert_from_bytes(file,500)
2021-02-24T02:02:07.068106+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.8/site-packages/pdf2image/pdf2image.py", line 270, in convert_from_bytes
2021-02-24T02:02:07.068124+00:00 app[web.1]: return convert_from_path(
2021-02-24T02:02:07.068131+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.8/site-packages/pdf2image/pdf2image.py", line 97, in convert_from_path
2021-02-24T02:02:07.068132+00:00 app[web.1]: page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"]
2021-02-24T02:02:07.068132+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.8/site-packages/pdf2image/pdf2image.py", line 471, in pdfinfo_from_path
2021-02-24T02:02:07.068133+00:00 app[web.1]: raise PDFPageCountError(
2021-02-24T02:02:07.068133+00:00 app[web.1]: pdf2image.exceptions.PDFPageCountError: Unable to get page count.
2021-02-24T02:02:07.068134+00:00 app[web.1]: pdfinfo: error while loading shared libraries: libpng12.so.0: cannot open shared object file: No such file or directory
What can I do to get poppler installed on heroku, and working in my app? Thanks!
Solution 1:
Look up which packages contain
libpng12.so.0
:https://packages.debian.org/search?lang=en&suite=jessie&arch=i386&mode=filename&searchon=contents&keywords=libpng12.so.0
It's libpng12-0:
https://packages.debian.org/en/jessie/libpng12-0
Exactly what you are doing (you are creating images from pdfs). It's a dependency required by your Poppler buildpack.
You've already managed to install the Python buildpack along with your Poppler buildpack. You need to add a third buildpack that installs the dependencies for Poppler first.
You can use the Apt buildpack: https://github.com/heroku/heroku-buildpack-apt
There you specify an
Aptfile
with the contentlibpng12-0
.Then the error message about
should be gone. New errors may pop up about other dependencies which you solve same way.
Solution 2:
Or you just fix the Poppler buildpack: https://github.com/survantjames/heroku-buildpack-poppler
As you can see from the ReadMe when they created it it was for Cedar-14. This stack is no longer available. You can attempt to make it compatible for the Heroku-20 stack.
In
compile
you have to instruct it to install the missing dependencies.Here you can see how dependencies for Calibre were installed.
If the problem is not a dependency problem but the installation of Poppler itself solution 1 won't work.
Solution 3:
There may already be a Poppler buildpack that supports the Heroku-20 stack. The Poppler buildpack you tried is 5 years old. There are many third party Poppler buildpacks, e.g. this one is just a little over 1 year old.
You will have to try them out.
Which solution is right I can't tell. You will have to try them out. E.g. solution 3 may not be available as of now. Heroku-20 stack is newish and people may have not made it compatible with the new Heroku stack yet. Maybe the buildpack I linked is a solution for Heroku-18 stack.
Solution 2 always work but it requires a bit of knowledge how buildpacks work and how they are created. It is the most "complicated" solution but also the most elegant one.
Solution 1 probably works but not necessary and is more straightforward to fix.