Unable to install pdftotext on Python 3.6, missing poppler

59k views Asked by At

How can I install pdftotext properly?

I'm getting the error message below when installing pdftotext in Python 3.6. I also tried to install the package manually by downloading the zip file but still got the same error.

  pdftotext/pdftotext.cpp(4): fatal error C1083: Cannot open include file: 'poppler/cpp/poppler-document.h': No such file or directory
  error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\x86_amd64\\cl.exe' failed with exit status 2     
7

There are 7 answers

2
herve-guerin On BEST ANSWER

I found some help in the Readme.md file in the pdftotext package :

1) Install OS Dependencies :

on Debian, Ubuntu, and friends:

sudo apt-get update
sudo apt-get install build-essential libpoppler-cpp-dev pkg-config python-dev

on Fedora, Red Hat, and friends:

sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python-devel redhat-rpm-config

2) Do the normal install :

pip install pdftotext

and it worked for me.

0
Ajay Singh On

Below command solved the problem for me.

sudo apt-get install libpoppler-cpp-dev

https://blog.droidzone.in/2018/05/01/install-pdftotext-python-extension-error/

3
West On

Simple solution for windows:

  1. Download the poppler zip file from http://blog.alivate.com.au/wp-content/uploads/2018/10/poppler-0.68.0_x86.7z
  2. Download and install visual studio tools from https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=BuildTools&rel=15
  3. Set the folder \poppler-0.68.0\bin to path in the environmental variables.

Thats it. Restart your environment eg could be jupyter notebook, vscode etc. Enjoy

0
Sami On

For Ubuntu users

sudo apt-get install libpoppler58=0.41.0-0ubuntu1 libpoppler-dev libpoppler-cpp-dev

worked for me

0
Dasma On

And for macOS:

brew install poppler

brew install pkg-config poppler python

0
Martin Graupner On

To install pdftotext on Windows 10, I tried to follow Jason Woods' answer.

I want to add to this answer, that it is necessary to have the "C++ Desktop applications development" package installed in Visual Studio.

Make sure to install the "C++ Build Tools" as well, as mentioned in Jason Woods' answer.

Follow the rest of his answer. Quick summary:

  • install Anaconda Python
  • in the Anaconda Prompt, type: conda install -c conda-forge poppler
  • now install the pdftotext package: pip install pdftotext

It worked for me. Thank you.

10
Jason Woods On

I've been trying to figure out how to install pdftotext on Win10 for a few days. Internet searches have given me nothing. So for those who need to know, here's installing pdftotext on Win10 with Anaconda. YMMV.

Install Anaconda Python. There are many articles on installing Anaconda, so I won't explore that here.

Try to run pip install pdftotext, you will get an error that the Microsoft Visual C++ is required.

Navigate in a browser to http://visualstudio.microsoft.com/downloads. Under the Tools for Visual Studio 2019 tab download the Build Tools for Visual Studio 2019. You’ll then install the tools by checking the C++ build tools option box and clicking Install.

You should now get the pip install to move past the VC++ error. Unfortunately you’ll now get the error “Cannot open include file: ‘poppler/cpp/poppler-document.h’. This is because you’re missing the poppler libraries.

Head back to the internets! You’ll need poppler for windows. At the time of this writing, your best option is http://blog.alivate.com.au/poppler-windows. Grab the latest binary, and uncompress it. If you look at the error, pip is looking for the header file at {Anaconda3 directory}\include\poppler\cpp\poppler-document.h. So look in the archive you just unzipped. In the include folder, you’ll see a poppler directory. If you go down into the cpp directory in there you’ll find the poppler-document.h file.

I copied the entire poppler directory into the Anaconda3\include folder, so do that.

If you try to run pip install again, you'll still get a ton of errors! But these are not any of the errors that you saw previously, instead this error is looking for a missing linked library, poppler-cpp.lib. A search through Conda installs on another machine found this file in the poppler package. So

conda install -c conda-forge poppler

Which will install our poppler-cpp.lib file. Then we can copy the file from its home at {Anaconda3 directory}\Library\lib\poppler-cpp.lib and paste it where pdftotext is expecting it at {Anaconda3 directory}\libs.

If we do a pip install pdftotext again, there it is! I’m sure someone will find a way to refine this a bit, but for now we have a working pdftotext Python library on Win10.

These directions can be found, with screenshots, at my blog https://coder.haus/2019/09/27/installing-pdftotext-through-pip-on-windows-10/