I am having some serious issues with nltk.download() method. Specifically, I am struggling with a SSL certificate problem (solutions posted in other treads don't seem to work, probably due to some interferences from pip) and .zip format handling.
Operating system: Sonoma 14.2.1. Ide: Pycharm. Interpreter: Pycharm/Projects/*Projectname*/.venv/bin/python (alternative: usr/local/bin/python3.12, but it doesn't matter which interpreter as same problems arise with both).
1- I have installed NLTK with pycharm in the virtualenv of the project. (Note: I also have NLTK downloaded also in the local library)
2- While working with the Python console, I can successfully import NLTK, as well as nltk.corpus. However, when it comes to nltk.download(*corpus*), it raises a well-known error (discussed in other threads, eg. SSL error downloading NLTK data):
[nltk_data] Error loading wordnet: <urlopen error [SSL:
[nltk_data] CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data] unable to get local issuer certificate (_ssl.c:1000)>False
3- I then attempt to resolve this issue by following the suggestions from Stack Overflow:
3.a) first method: Install Certificates.command, i.e. running run the following terminal command:
/Applications/Python/Install Certificates.command
or wherever the relevant Install Certificates.command is located. I have tried both, by command line and by clicking visually on the file, and this is the error raised in the terminal window automatically opened:
/Applications/Python\ 3.12/Install\ Certificates.command ; exit;
-- pip install --upgrade certifi
/Library/Frameworks/Python.framework/Versions/3.12/bin/python3.12: No module named pip
Traceback (most recent call last):
File "<stdin>", line 44, in <module>
File "<stdin>", line 24, in main
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py", line 413, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/Library/Frameworks/Python.framework/Versions/3.12/bin/python3.12', '-E', '-s', '-m', 'pip', 'install', '--upgrade', 'certifi']' returned non-zero exit status 1.
Saving session...
...copying shared history...
...saving history...truncating history files...
...completed.
No module named pip. How can I fix this? When I am installing python stuff, I use always the command pip3: is it possible that, being written pip instead of pip3 in the Install Certificate.command terminal script, it doesn't work?
3b) disabling SSL check with running
import nltk
import ssl
try:
_create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
pass
else:
ssl._create_default_https_context = _create_unverified_https_context
nltk.download()
Actually it seems to work, and it automatically download a .zip file in a new directory /Users/*username***/nltk_data (thus, not in the virtual env). But if I try to do any kind of action with WordNet (the data downloaded), eg. the sample code:
import nltk
from nltk.corpus import wordnet as wn
wn.synset('dog')
it raises the error:
Traceback (most recent call last):
File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/pydevconsole.py", line 364, in runcode
coro = func()
^^^^^^
File "<input>", line 1, in <module>
File "/Users/username/PycharmProjects/projectname/.venv/lib/python3.12/site-packages/nltk/corpus/util.py", line 121, in __getattr__
self.__load()
File "/Users/username/PycharmProjects/projectname/.venv/lib/python3.12/site-packages/nltk/corpus/util.py", line 81, in __load
root = nltk.data.find(f"{self.subdir}/{self.__name}")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/username/PycharmProjects/projectname/.venv/lib/python3.12/site-packages/nltk/data.py", line 555, in find
return find(modified_name, paths)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/username/PycharmProjects/projectname/.venv/lib/python3.12/site-packages/nltk/data.py", line 542, in find
return ZipFilePathPointer(p, zipentry)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/username/PycharmProjects/projectname/.venv/lib/python3.12/site-packages/nltk/compat.py", line 41, in _decorator
return init_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/username/PycharmProjects/projectname/.venv/lib/python3.12/site-packages/nltk/data.py", line 394, in __init__
zipfile = OpenOnDemandZipFile(os.path.abspath(zipfile))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/username/PycharmProjects/projectname/.venv/lib/python3.12/site-packages/nltk/compat.py", line 41, in _decorator
return init_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/username/PycharmProjects/projectname/.venv/lib/python3.12/site-packages/nltk/data.py", line 935, in __init__
zipfile.ZipFile.__init__(self, filename)
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/zipfile/__init__.py", line 1341, in __init__
self._RealGetContents()
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/zipfile/__init__.py", line 1408, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
And I am stuck.
Final strange Pycharm behavior: the interpreter is Python 3.12, but if I try in the terminal the command line python --version --> Python 3.9.6, when python3 --version --> Python 3.12.2 . NOTE: I have never installed python 3.9 by my own, I think it is a pre-installed version on Mac (?).
Where do you think is the origin of the problem?
I would be very happy if someone could help me in making nltk properly work on Pycharm (I am very tired of coding on Google Colab).
Moreover,
How can I fix the "No module named pip" issue? (And thus, potentially, the Certificate Issue)?
Where need to be located the nltk_data directory (when I install nltk package, nltk_data directory is not built). And the Zip file?
Thank you very much in advance!