Access custom module in Azure ML subprocess / Edit PYTHONPATH inside of Azure ML pipeline

392 views Asked by At

I have the following project structure:

    .
    ├── my_custom_module
    │   └── __init__.py
    │   └── ...
    ├── scripts
    │   ├── start_script.py
    │   └── example.py

I am running start_script.py inside of Azure ML studio pipeline. Inside of start_script.py I need to run example.py by using:

subprocess.run(['python3.9', "scripts/example.py"], check=True).

example.py on the other hand needs access to my_custom_module (from my_custom_module import some_class).

I keep getting ModuleNotFoundError: No module named my_custom_module errors, because the module is not added to the PYTHONPATH.

How do I add a custom module to the PYTHONPATH inside of Azure ML, such that it is visible by a subprocess?

Here are some debug information (I shortened some hash codes for better readability):

#os.getcwd() inside start_script.py and example.py returns:    
/mnt/azureml/cr/j/e9e/exe/wd 

# printing sys.path inside start_script.py and example.py returns:
/mnt/azureml/cr/j/21d/exe/wd/scripts
/azureml-envs/azureml_e9e/lib/python39.zip
/azureml-envs/azureml_e9e/lib/python3.9
/azureml-envs/azureml_e9e/lib/python3.9/lib-dynload
/azureml-envs/azureml_e9e/lib/python3.9/site-packages
/mnt/azureml/cr/j/21d/exe/wd
/azureml-envs/azureml_e9e/lib/python3.9/site-packages/azureml/_project/vendor

# os.system(which python) inside start_script.py:
/azureml-envs/azureml_e9e/bin/python 
# os.system(which python) inside example.py returns nothing

so far I have tried to add my_custom_module to the PYTHONPATH inside of start_script.py so example.py can import it by using:

os.system(f"export PYTHONPATH=$PYTHONPATH:{os.getcwd()}") # tested also without "$PYTHONPATH:"
os.system(f"export PYTHONPATH=$PYTHONPATH:{os.getcwd() + '/my_custom_module'}") # tested also without "$PYTHONPATH:"
sys.path.append(os.getcwd() + "/my_custom_module")

So far, nothing had worked. Appending it to sys.path made it show up inside of start_script.py, but NOT inside of example.py.

Does anyone have any idea how to solve my problem?

1

There are 1 answers

0
Sairam Tadepalli On BEST ANSWER

we cannot edit the PYTHONPATH inside the default pipeline. Instead, we can create the Data Science VM using the ARM template and make the custom modifications inside the current working directory.

Ubuntu Based:

# create a Ubuntu Data Science VM in your resource group
az vm create --resource-group YOUR-RESOURCE-GROUP-NAME --name YOUR-VM-NAME --image microsoft-dsvm:linux-data-science-vm-ubuntu:linuxdsvmubuntu:latest --admin-username YOUR-USERNAME --admin-password YOUR-PASSWORD --generate-ssh-keys --authentication-type password

Windows Based:

# create a Windows Server 2016 DSVM in your resource group
az vm create --resource-group YOUR-RESOURCE-GROUP-NAME --name YOUR-VM-NAME --image microsoft-dsvm:dsvm-windows:server-2016:latest --admin-username YOUR-USERNAME --admin-password YOUR-PASSWORD --authentication-type password

Create a conda environment for azure machine learning

**conda create -n py310 python=310**

Activate and Install

conda activate py310
pip install azure-ai-ml

To deploy:

To deploy the template, we need to use the following procedure mentioned in below link.

https://learn.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/dsvm-tutorial-resource-manager