I'm using a user_identity to read data from Azure Data Lake. I'm saving the data to a datastore. Then I want to use the datastore for my parallel job, but I keep running into this error:
Please specify a intermediate datastore for Parallel Run Step run-time when credential passthrough is enabled. Parallel Run Step will use your user identity to acceess the datastore. Warning: Please carefully control the scale of access to prevent intermediate data leak!
Here's my job definition for reference:
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
display_name: model
description: a model
identity:
type: user_identity
jobs:
copy_model:
type: command
compute: azureml:default-compute
environment: azureml:default-env
inputs:
input_path:
type: uri_folder
path: azureml://datastores/default/paths/
mode: ro_mount
outputs:
output_path:
type: uri_folder
path: azureml://datastores/${{default_datastore}}/paths/model
command: |
rsync -ah --progress ${{inputs.input_path}}/model ${{outputs.output_path}}
copy_data:
type: command
compute: azureml:default-compute
environment: azureml:default-env
inputs:
input_path:
type: uri_folder
path: azureml://datastores/default/paths/
mode: ro_mount
input_folder: folder
outputs:
output_path:
type: uri_folder
path: azureml://datastores/${{default_datastore}}/paths/data
command: |
rsync -ah --progress ${{inputs.input_path}}/${{inputs.input_folder}} ${{outputs.output_path}}
model:
type: parallel
compute: azureml:default-compute
inputs:
score_model:
type: uri_folder
path: ${{parent.jobs.copy_model.outputs.output_path}}
mode: ro_mount
job_data_path:
type: uri_folder
path: ${{parent.jobs.copy_data.outputs.output_path}}
mode: ro_mount
outputs:
output_path:
type: uri_file
path: azureml://datastores/${{default_datastore}}/paths/results/output.csv
mode: rw_mount
mini_batch_size: "1"
resources:
instance_count: 1
mini_batch_error_threshold: 5
logging_level: "DEBUG"
input_data: ${{inputs.job_data_path}}
max_concurrency_per_instance: 2
retry_settings:
max_retries: 2
timeout: 60
task:
type: run_function
code: ./src
entry_script: model.py
environment: azureml:default-env
program_arguments: >-
--model-path ${{inputs.score_model}}
append_row_to: ${{outputs.output_path}}