I am building an Azure ML Pipeline for batch scoring. In one step I need to access a key stored in the workspace's Azure Keyvault.
However, I want to strictly separate the authoring environment (responsible for creating the datasets, building the environment, building and running the pipeline) and the production environment (responsible for transforming data, running the prediction etc.). Therefore, code in the production environment should be somewhat Azure agnostic. I want to be able to submit my inference script to Google Cloud Compute Instances, if needed.
Thus my question is: What is the best practise to pass secrets to remote runs without having the remote script retrieve it from the keyvault itself? Is there a way to have redacted environment variables or command line arguments?
Thanks!
Example of what I would like to happen:
# import all azure dependencies
secret = keyvault.get_secret("my_secret")
pipeline_step = PythonScriptStep(
script_name="step_script.py",
arguments=["--input_data", input_data, "--output_data", output_data],
compute_target=compute,
params=["secret": secret] # This will create an env var on the remote?
)
pipeline = Pipeline(workspace, steps=[pipeline_step])
PipelineEndpoint.publish(...)
An within step_script.py
:
# No imports from azureml!
secret = os.getenv("AML_PARAMETER_secret")
do_something(secret)