Pipeline job parameters not passed from pipeline yaml

142 views Asked by At

I'm running a pipeline job with a sweep step in AzureML. I'm using the CLI syntax to create the components. I have a pipeline.yaml file that refers to train.yaml as the trial. train.yaml in turn calls train.py.

The job fails after launch because the path that I'm passing it as an argument within train.py, which should be azureml:dataset_name:1, is None.

To test what is going on, I logged the other arguments passed. And I discovered that even though I specify the parameters you see below: enter image description here

in my logger, the values are those that coincide with the defaults in my add_argument definition within train.py (the below is from the log of one of the child runs in the sweep): enter image description here

So let's ignore the question of the path being None for now. I want to understand why the values from pipeline.yaml aren't being passed correctly. I triple-checked my arguments and the names match between the 3 files.

For reference:

pipeline.yaml

$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
display_name: test_sweep
description: Tune hyperparameters
settings:
  default_compute: azureml:test-compute
jobs:
  sweep_step:
    type: sweep
    inputs:
      data_path:
        type: uri_file
        path: azureml:test_data:1
      seq_length: 100
      epochs: 1
    outputs:
      model_output:
    sampling_algorithm: random
    search_space:
      batch_size:
        type: choice
        values: [1, 5, 10, 15]
      learning_rate:
        type: loguniform
        min_value: -6.90775527898
        max_value: -2.30258509299
    trial: ./train.yaml
    objective:
      goal: maximize
      primary_metric: bleu_score
    limits:
      max_total_trials: 5
      max_concurrent_trials: 3
      timeout: 3600 # 1 hour
      trial_timeout: 720 # 20 mins

train.yaml

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
type: command

name: train_model
display_name: train_model
version: 1

inputs:
  data_path:
    type: uri_file
  batch_size:
    type: integer
  learning_rate:
    type: number
  seq_length:
    type: integer
  epochs:
    type: integer

outputs:
  model_output:
    type: mlflow_model

code: ../src

environment: azureml:test_env:2

command: >-
  python train.py
    --data_path ${{inputs.data_path}}
    --output_path ${{outputs.model_output}}
    --batch_size ${{inputs.batch_size}}
    --learning_rate ${{inputs.learning_rate}}
    --seq_length ${{inputs.seq_length}}
    --epochs ${{inputs.epochs}}

train.py (just the relevant part)

import argparse
import logging

def main(args):
    data_path = args.data_path
    output_path = args.output_path
    batch_size = args.batch_size
    seq_length = args.seq_length
    epochs = args.epochs
    learning_rate = args.learning_rate

    handler = logging.StreamHandler()
    logger = logging.getLogger(__name__)
    logger.addHandler(handler)
    logger.setLevel(logging.INFO)

    logger.info(f'path: {data_path}')
    logger.info(f'batch_size: {batch_size}')
    logger.info(f'seq_length: {seq_length}')
    logger.info(f'epochs: {epochs}')

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--data_path", type=str)
    parser.add_argument("--output_path")
    parser.add_argument("--batch_size", type=int, default=5)
    parser.add_argument("--learning_rate", type=float, default=1e-5)
    parser.add_argument("--seq_length", type=int, default=500)
    parser.add_argument("--epochs", type=int, default=3)
    args = parser.parse_args()

    return args


if __name__ == "__main__":
    args = parse_args()
    main(args)
0

There are 0 answers