how to include NLTK wordnet in a PYPI package

38 views Asked by At

I have a Python package that uses nltk WordNetLemmatizer, the package requires the users to have nltk version 3.7 installed and 'wordnet' downloaded for the package to function correctly.

I'm including wordnet files in my python package, here is the tree view of my package

├── LICENSE
├── MANIFEST.in
├── README.md
├── englishidioms
│   ├── L_algorithm.py
│   ├── __init__.py
│   ├── phrases.json
│   └── wordnet
│       ├── LICENSE
│       ├── README
│       ├── adj.exc
│       ├── adv.exc
│       ├── citation.bib
│       ├── cntlist.rev
│       ├── data.adj
│       ├── data.adv
│       ├── data.noun
│       ├── data.verb
│       ├── index.adj
│       ├── index.adv
│       ├── index.noun
│       ├── index.sense
│       ├── index.verb
│       ├── lexnames
│       ├── noun.exc
│       └── verb.exc
└── setup.py

i've also created the following function for nltk to find the correct path for wordnet

def set_nltk_wordnet_path():

    # Find the path to the wordnet.zip file within the package
    wordnet_dir = os.path.join(os.path.dirname(__file__), "wordnet")

    # Ensure that NLTK resources are available
    nltk.data.path.append(wordnet_dir)

and I made sure to call this function before ever using WordNetLemmatizer in my code

now, when I try to use my package I get the following error

(10venv) E:\python>python
Python 3.12.1 (tags/v3.12.1:2305ca5, Dec  7 2023, 22:03:25) [MSC v.1937 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from englishidioms import find_idioms
>>> sentence = "The plan didn't work, but I'll give you an a for effort for trying."
>>> results = find_idioms(sentence, limit=1)
Traceback (most recent call last):
  File "E:\python\10venv\Lib\site-packages\nltk\corpus\util.py", line 84, in __load
    root = nltk.data.find(f"{self.subdir}/{zip_name}")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\python\10venv\Lib\site-packages\nltk\data.py", line 583, in find
    raise LookupError(resource_not_found)
LookupError:
**********************************************************************
  Resource ←[93mwordnet←[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  ←[31m>>> import nltk
  >>> nltk.download('wordnet')
  ←[0m
  For more information see: https://www.nltk.org/data.html

  Attempted to load ←[93mcorpora/wordnet.zip/wordnet/←[0m

  Searched in:
    - 'C:\\Users\\Hawker/nltk_data'
    - 'E:\\python\\10venv\\nltk_data'
    - 'E:\\python\\10venv\\share\\nltk_data'
    - 'E:\\python\\10venv\\lib\\nltk_data'
    - 'C:\\Users\\Hawker\\AppData\\Roaming\\nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
    - 'E:\\python\\10venv\\Lib\\site-packages\\englishidioms\\wordnet'
**********************************************************************


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "E:\python\10venv\Lib\site-packages\englishidioms\L_algorithm.py", line 533, in find_idioms
    potential_matches = get_potential_matches(sentence)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\python\10venv\Lib\site-packages\englishidioms\L_algorithm.py", line 217, in get_potential_matches
    WordNetLemmatizer().lemmatize(word, "v")
  File "E:\python\10venv\Lib\site-packages\nltk\stem\wordnet.py", line 45, in lemmatize
    lemmas = wn._morphy(word, pos)
             ^^^^^^^^^^
  File "E:\python\10venv\Lib\site-packages\nltk\corpus\util.py", line 121, in __getattr__
    self.__load()
  File "E:\python\10venv\Lib\site-packages\nltk\corpus\util.py", line 86, in __load
    raise e
  File "E:\python\10venv\Lib\site-packages\nltk\corpus\util.py", line 81, in __load
    root = nltk.data.find(f"{self.subdir}/{self.__name}")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\python\10venv\Lib\site-packages\nltk\data.py", line 583, in find
    raise LookupError(resource_not_found)
LookupError:
**********************************************************************
  Resource ←[93mwordnet←[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  ←[31m>>> import nltk
  >>> nltk.download('wordnet')
  ←[0m
  For more information see: https://www.nltk.org/data.html

  Attempted to load ←[93mcorpora/wordnet←[0m

  Searched in:
    - 'C:\\Users\\Hawker/nltk_data'
    - 'E:\\python\\10venv\\nltk_data'
    - 'E:\\python\\10venv\\share\\nltk_data'
    - 'E:\\python\\10venv\\lib\\nltk_data'
    - 'C:\\Users\\Hawker\\AppData\\Roaming\\nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
    - 'E:\\python\\10venv\\Lib\\site-packages\\englishidioms\\wordnet'
**********************************************************************

>>>

for some reason nltk cant load wordnet resources even though it's located in E:\\python\\10venv\\Lib\\site-packages\\englishidioms\\wordnet

how can I fix this?

0

There are 0 answers