- YAML appears correct, there are no validation issues, and the pipeline can be seen in the Azure ML Studio GUI
- I'm assuming the error is thrown by
mlflow.sklearn.autolog()
when the fit()
method is called
- Full stacktrace not available, the exception shown below is the full exception raised in Azue ML Studio GUI
- I've commented out the function
save_outputs()
and the same error is raised, leading to my assumption regarding MLFlow SDK attempting to autolog the model
- I haven't included the code below for the
predict
job in the pipeline, this step doesn't get to execute as the seep
job in the pipeline fails first
Exception raised in Azure ML GUI
UserErrorException:
Message: Model asset creation API failed with {'additional_properties': {'message': 'The request is invalid.', 'details': [{'code': 'ModelAssetPathNotFoundInStorage', 'message': 'No blobs found in storage at model asset path: azureml/HD_9b8798ab-c0cb-4c5d-8822-1411c01af249_0/model/'}], 'code': 'BadRequest', 'statusCode': 400}, 'error': <data_capability._restclient.model.models._models_py3.RootError object at 0x7f210407e310>, 'correlation': {'operation': '214b5eca2adce7f52fdba06fb5003437', 'request': '9515d04215048b81', 'RequestId': '9515d04215048b81'}, 'environment': '<REDACTED>', 'location': '<REDACTED>', 'time': datetime.datetime(2023, 1, 12, 23, 1, 27, 38782, tzinfo=<FixedOffset '+00:00'>), 'component_name': 'modelregistry'}
InnerException None
ErrorResponse
{
"error": {
"code": "UserError",
"message": "Model asset creation API failed with {'additional_properties': {'message': 'The request is invalid.', 'details': [{'code': 'ModelAssetPathNotFoundInStorage', 'message': 'No blobs found in storage at model asset path: azureml/HD_9b8798ab-c0cb-4c5d-8822-1411c01af249_0/model/'}], 'code': 'BadRequest', 'statusCode': 400}, 'error': <data_capability._restclient.model.models._models_py3.RootError object at 0x7f210407e310>, 'correlation': {'operation': '214b5eca2adce7f52fdba06fb5003437', 'request': '9515d04215048b81', 'RequestId': '9515d04215048b81'}, 'environment': '<REDACTED>', 'location': '<REDACTED>', 'time': datetime.datetime(2023, 1, 12, 23, 1, 27, 38782, tzinfo=<FixedOffset '+00:00'>), 'component_name': 'modelregistry'}"
}
} Marking the experiment as failed because initial child jobs have failed due to user error
CLI Command
$ az ml job create --subscription <REDACTED> --resource-group <REDACTED> --workspace-name <REDACTED> --file /home/azureuser/cloudfiles/code/Users/<REDACTED>/repos/<REDACTED>/src/assets/pipeline_tune.yml --stream
RunId: quirky_bone_tf250gdlfg
Web View: https://ml.azure.com/runs/<REDACTED>?wsid=/subscriptions/<REDACTED>/resourcegroups/<REDACTED>/workspaces/<REDACTED>
Streaming logs/azureml/executionlogs.txt
========================================
[2023-01-12 22:58:42Z] Submitting 1 runs, first five are: <REDACTED>
[2023-01-12 23:03:46Z] Execution of experiment failed, update experiment status and cancel running nodes.
Execution Summary
=================
RunId: <REDACTED>
Web View: https://ml.azure.com/runs/<REDACTED>?wsid=/subscriptions/<REDACTED>/resourcegroups/<REDACTED>/workspaces/<REDACTED>
Exception :
{
"error": {
"code": "UserError",
"message": "Pipeline has some failed steps. See child run or execution logs for more details.",
"message_format": "Pipeline has some failed steps. {0}",
"message_parameters": {},
"reference_code": "PipelineHasStepJobFailed",
"details": []
},
"environment": "<REDACTED>",
"location": "<REDACTED>",
"time": "2023-01-12T23:03:45.982134Z",
"component_name": ""
}
Folder structure
src/
assets/
component_train.yml
pipeline_tune.yml
train.py
src/assets/pipeline_tune.yml
# References
# ----------
# - How to create component pipelines
# - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-component-pipelines-cli
# - Pattern reference
# - https://github.com/Azure/azureml-examples/tree/main/cli/jobs/pipelines-with-components/pipeline_with_hyperparameter_sweep
# - Pipeline schema
# - https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-job-pipeline
# - Sweep Job schema (hyperparameter tuning)
# - https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-job-sweep
# - Core Azure ML YAML syntax
# - https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-core-syntax#binding-inputs-and-outputs-between-steps-in-a-pipeline-job
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
# -------------------------------------------------------------------
# Pipeline settings
# - Having inputs defined at the pipelione level, instead of the first
# job, allows for parameterisation of the pipeline via both CLI/SDK
experiment_name: <REDACTED>
description: Tune hyperparemeters for training a scikit-learn SVM on the Iris dataset.
settings:
default_compute: azureml:aml-compute-cpu
default_datastore: azureml:workspaceblobstore
inputs:
data:
type: uri_file
mode: ro_mount
path: wasbs://[email protected]/iris.csv
outputs:
predict:
type: uri_folder
mode: rw_mount
path: azureml://datastores/workspaceblobstore/paths/<REDACTED>
# -------------------------------------------------------------------
# Jobs
jobs:
# Tune job
tune:
type: sweep
inputs:
data: ${{parent.inputs.data}}
outputs:
model:
type: mlflow_model
test_data:
type: uri_folder
trial: ./component_train.yml
search_space:
c_value:
type: uniform
min_value: 0.5
max_value: 0.9
kernel:
type: choice
values:
- rbf
- linear
- poly
coef0:
type: uniform
min_value: 0.1
max_value: 1
sampling_algorithm: random
objective:
goal: minimize
primary_metric: training_f1_score
limits:
max_total_trials: 20
max_concurrent_trials: 10
timeout: 7200
# Score test data
predict:
type: command
inputs:
model: ${{parent.jobs.tune.outputs.model}}
test_data: ${{parent.jobs.tune.outputs.test_data}}
outputs:
predictions: ${{parent.outputs.predict}}
component: ./component_predict.yml
src/assets/component_train.yml
# References
# ----------
# - How to create component pipelines
# - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-component-pipelines-cli
# - Command schema
# - https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-job-command
# - Core Azure ML YAML syntax
# - https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-core-syntax#binding-inputs-and-outputs-between-steps-in-a-pipeline-job
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
type: command
# -------------------------------------------------------------------
# Command settings
version: 1
description: Training a scikit-learn SVM on the Iris dataset
environment: azureml:<REDACTED>:1
inputs:
data:
type: uri_file
c_value:
type: number
default: 1.0
kernel:
type: string
default: rbf
coef0:
type: number
default: 0
outputs:
model:
type: mlflow_model
test_data:
type: uri_folder
# -------------------------------------------------------------------
# Job
code: ..
command: >-
python train.py
--data ${{inputs.data}}
--C ${{inputs.c_value}}
--kernel ${{inputs.kernel}}
--coef0 ${{inputs.coef0}}
--outputs_model ${{outputs.model}}
--outputs_test_data ${{outputs.test_data}}
src/train.py
"""
Notes
-----
- Imports in this file must match the imports in `score.py` to allow
pickle objects to be loaded correctly by `score.py`
References
----------
- Azure ML Environments and ScriptRunConfig for training
- i.e How to execute this script against Azure ML compute cluster
- https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments#use-environments-for-training
- Azure ML - Hyperparameter tuning
- https://learn.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters
- Azure ML - Hyperparameter tuning in Azure Machine Learning pipeline
- https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-sweep-in-pipeline#how-to-do-hyperparameter-tuning-in-azure-machine-learning-pipeline
- Azure ML - Random, Grid, Bayesian sampling
- https://learn.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters#sampling-the-hyperparameter-space
- Azure ML - How to Train scikit-learn
- https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-scikit-learn#prepare-the-training-script
- Azure ML - How to train Tesnorflow
- https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-tensorflow
- Azure ML - How to train Keras
- https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-keras
- Azure ML - How to train PyTorch
- https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-pytorch
- Azure ML - Logging with MLFlow
- https://learn.microsoft.com/en-us/azure/machine-learning/how-to-log-view-metrics?tabs=jobs#getting-started
- MLFlow - Autologging of frameworks
- https://mlflow.org/docs/latest/tracking.html#automatic-logging
"""
import argparse
from distutils.dir_util import copy_tree
from pathlib import Path
import mlflow.sklearn
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
def get_data(data):
"""Get data and return train/test splits"""
df = pd.read_csv(data)
X = df.iloc[:, :-1]
y = df.iloc[:, -1]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=0
)
# Return split data
return X_train, X_test, y_train, y_test
def get_hyperparameters(**kwargs):
"""Set hyperparameters here
References
----------
- Understand how to pass hyperparameters to a sklearn pipeline
- https://stackoverflow.com/questions/66388056/why-does-sklearn-pipeline-set-params-not-work
"""
hyperparameters = {
"estimator__C": kwargs.get("c_value"),
"estimator__kernel": kwargs.get("kernel"),
"estimator__coef0": kwargs.get("coef0"),
}
return hyperparameters
def save_outputs(model, model_dir, X_test, y_test, test_data_dir):
"""Save outputs of the training process"""
# Save model
local_dir = "model"
mlflow.sklearn.save_model(model, local_dir)
copy_tree(local_dir, model_dir)
# Save test data
X_test.to_csv(Path(test_data_dir) / "X_test.csv", index=False)
y_test.to_csv(Path(test_data_dir) / "y_test.csv", index=False)
def train_model(hyperparameters, X_train, y_train):
"""Train the model with your chosen framework here"""
# Model architecture
model = Pipeline(
steps=[
("scaler", StandardScaler()),
("estimator", SVC()),
]
)
# Set hyperparameters for training run
model.set_params(**hyperparameters)
# Train model
model.fit(X_train, y_train)
return model
def parse_args():
"""Parse args and hyperparameters"""
parser = argparse.ArgumentParser()
# Parse mandatory args
parser.add_argument("data", help="Path to data for training", type=str)
parser.add_argument("outputs_model", help="Name of the model", type=str)
parser.add_argument("outputs_test_data", help="Path to data for testing", type=str)
# Parse hyperparameter args
parser.add_argument("c_value", help="Coeffiecient for the estimator", type=str)
parser.add_argument("kernel", help="Kernel for the estimator", type=str)
parser.add_argument("coef0", help="Coeffiecient for the estimator", type=str)
# Get args
args = parser.parse_args()
return args
def main(**kwargs):
"""Train the model(s)
Parameters
----------
kwargs : dict
Dictionary of all parsed arguments
"""
# Logging
# - Autologging works with (0.22.1 <= scikit-learn <= 1.1.3)
# - See: https://www.mlflow.org/docs/latest/python_api/mlflow.sklearn.html#mlflow.sklearn.autolog
mlflow.sklearn.autolog()
# Setup
X_train, X_test, y_train, y_test = get_data(kwargs.get("data"))
hyperparameters = get_hyperparameters(**kwargs)
# Train
model = train_model(hyperparameters, X_train, y_train)
# Output
save_outputs(model, kwargs.get("outputs_model"), X_test, y_test, kwargs.get("outputs_test_data"))
if __name__ == "__main__":
"""Entrypoint for training the model(s)"""
main(**vars(parse_args()))