Dependency hell with beautifulsoup4 and lxml

758 views Asked by At

I have built a small utility using Python 3.8. Among other things it extracts some data from XML files using beautifulsoup4 and lxml. I use PyCharm and virtualenv for development and my utility works just fine.

In order to distribute the util to others I have a build script that copies my code to a dist directory and install all dependencies into that directory using pip install -r requirements.txt -t dist. This also works fine and I can run the code in the dist directory from my system interpreter (3.8, no beautifulsoup, no lxml). The dependencies can be loaded from dist, it appears.

It doesn't work on other machines, though. The script produces the error message

Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

Which means that beautifulsoup4 can't find lxml (same with "lxml-xml" or "xml"). The dependencies in the dist dir appear to be correct, though. Nothing seems to be missing. I get the same error when I package the script as a zip app using python -m zipapp -p "python" dist, which yields a file dist.pyz. It can be executed but runs into the same error message, on my own machine.

This is my requirements.txt file:

beautifulsoup4
jinja2
lxml

And this is the instantiation of the BeautifulSoup parser:

soup = BeautifulSoup(xml_data, features='lxml')

xml_data is just a string containing some valid XML that is read from a file generated by another tool.

I am out of ideas. I have lots of experience with .NET and Java but am not the greatest Python coder on the planet. It seems that I have entered the Python version of dependency hell... I really don't want to have users of the scripts invoke pip install lxml on their machines. I want to distribute a self-contained app with all dependencies.

Any help is appreciated.

Update

The order of the entries in requirements.txt makes no difference (as I had hoped).

I added

from lxml.builder import ElementMaker
...
e = ElementMaker()

to the main script in order to import lxml into my script. This yields the error

Traceback (most recent call last): File "C:\Program Files\Python38\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Program Files\Python38\lib\runpy.py", line 87, in run_code exec(code, run_globals) File "dist.pyz_main.py", line 4, in File "", line 259, in load_module File "dist.pyz\lrg.py", line 3, in File "", line 259, in load_module File "dist.pyz\lxml\builder.py", line 44, in ModuleNotFoundError: No module named 'lxml.etree'

when run as a zip app but works fine from my IDE that uses a virtualenv.

0

There are 0 answers