Running spaCy 3.7.2 on Databricks on AWS in a network limited environment. Error when trying initiate/use contextualSpellCheck. To get around what looks like a network issue I've installed the en_core_web_sm-3.7.1-py3-none-any.whl in the cluster environment but not sure how to call that local version in the code (I'm new to spaCy).
Also "local variable 'model' referenced before assignment" error.
CODE
import contextualSpellCheck
import spacy
nlp = spacy.load("en_core_web_sm")
print(f"Model Name: {nlp.meta['name']}")
print(f"Model Lang: {nlp.meta['lang']}")
print(f"Model Version: {nlp.meta['version']}")
nlp.pipe_names
contextualSpellCheck.add_to_pipe(nlp)
nlp.pipe_names
doc = nlp('Income was $9.4 milion compared to the prior year of $2.7 milion.')
doc._.outcome_spellCheck
ERROR:
local variable 'model' referenced before assignment
ReadTimeout: (ReadTimeoutError("HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: 3a762e01-c83f-4766-a16f-5fa3dcfe7d52)')
---------------------------------------------------------------------------
TimeoutError Traceback (most recent call last)
File /databricks/python/lib/python3.10/site-packages/urllib3/connectionpool.py:386, in HTTPConnectionPool._make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
385 try:
--> 386 self._validate_conn(conn)
387 except (SocketTimeout, BaseSSLError) as e:
388 # Py2 raises this as a BaseSSLError, Py3 raises it as socket timeout.
File /databricks/python/lib/python3.10/site-packages/urllib3/connectionpool.py:1042, in HTTPSConnectionPool._validate_conn(self, conn)
1041 if not getattr(conn, "sock", None): # AppEngine might not have `.sock`
-> 1042 conn.connect()
1044 if not conn.is_verified:
File /databricks/python/lib/python3.10/site-packages/urllib3/connection.py:414, in HTTPSConnection.connect(self)
412 context.load_default_certs()
--> 414 self.sock = ssl_wrap_socket(
415 sock=conn,
416 keyfile=self.key_file,
417 certfile=self.cert_file,
418 key_password=self.key_password,
419 ca_certs=self.ca_certs,
I tried installing the py whl file for the model in my cluster library. I decomposed the code and ran each line in a separate cell to debug.
nlp.pipe_names runs fine.
The problem seems to starts with:
contextualSpellCheck.add_to_pipe(nlp)
Any and all thoughts gratefully received.