I'm trying to run a pyspark job in a local enviroment.
After setting up pipenv and installing the module (numpy) successfully, the module still not visible to the code.
Using pip to install the library instead of pipenv works. What am I missing here?
The terminal output is shown below.
PS C:\Users\user\Desktop\spark\test> pipenv shell
Shell for C:\Users\user\.virtualenvs\test-sCQB0P3C already activated.
No action taken to avoid nested environments.
PS C:\Users\user\Desktop\spark\test> pipenv graph
numpy==1.20.3
pipenv==2020.11.15
- certifi [required: Any, installed: 2020.12.5]
- pip [required: >=18.0, installed: 21.1.1]
- setuptools [required: >=36.2.1, installed: 56.0.0]
- virtualenv [required: Any, installed: 20.4.6]
- appdirs [required: >=1.4.3,<2, installed: 1.4.4]
- distlib [required: >=0.3.1,<1, installed: 0.3.1]
- filelock [required: >=3.0.0,<4, installed: 3.0.12]
- six [required: >=1.9.0,<2, installed: 1.16.0]
- virtualenv-clone [required: >=0.2.5, installed: 0.5.4]
pyspark==2.4.0
- py4j [required: ==0.10.7, installed: 0.10.7]
PS C:\Users\user\Desktop\spark\test> spark-submit --master local[*] --files
configs\etl_config.json jobs\etl_job.py
Traceback (most recent call last):
File "C:/Users/user/Desktop/spark/test/jobs/etl_job.py", line 40, in <module>
from dependencies.class import XLoader
File "C:\Users\user\Desktop\spark\test\dependencies\X.py", line 2, in <module>
import numpy as np
ModuleNotFoundError: No module named 'numpy'
make sure you are in the same directory as your Pipfile
pipenv shell
and thenpipenv install