I am trying to migrate Pipelines from Azure Machine Learning SDK V1 to V2, but sometimes I don't understand the logic behind the V2 and I get stuck.
In V1, I just had to create PythonScriptStep and wrap it into a StepSequence and deploy the pipeline. My scripts are simple, no input, no outputs. We store data in ADLS Gen2 and use databricks tables as inputs. This is why I don't have any inputs/outputs.
script_step_1 = PythonScriptStep(
name="step1",
script_name="main.py",
arguments=arguments, # list of PipelineParameter
compute_target=ComputeTarget(workspace=ws, name="cpu-16-128"),
source_directory="./my_project_folder",
runconfig=runconfig, # Conda + extra index url + custom dockerfile
allow_reuse=False,
)
script_step_2 = PythonScriptStep(
name="step2",
...
)
step_sequence = StepSequence(
steps=[
script_step_1,
script_step_2,
]
)
# Create Pipeline
pipeline = Pipeline(
workspace=ws,
steps=step_sequence,
)
pipeline_run = experiment.submit(pipeline)
With V2, we need to create a "node" in a component that will be use by a pipeline.
I've made my Environment with dockerfile with BuildContext, and feed a representation of requirements.txt to a conda environment dictionary where I added my extra index url.
azureml_env = Environment(
build=BuildContext(
path="./docker_folder", # With Dockerfile and requirements.txt
),
name="my-project-env",
)
Now I make a command component that will invoke python and a script with some arguments:
step_1 = command(
environment=azureml_env ,
command="python main.py",
code="./my_project_folder",
)
Now that I have my step1 and step2 in SDK V2, I have no clue on how to make a sequence without Input/Output
@pipeline(compute="serverless")
def default_pipeline():
return {
"my_pipeline": [step_1, step_2]
}
I can not manage to make the pipeline
work to make a basic run a 2 consecutive steps.
I guess after I manage to get this right, I can create/update the pipeline like this:
my_pipeline = default_pipeline()
# submit the pipeline job
pipeline_job = ml_client.jobs.create_or_update(
my_pipeline,
experiment_name=experiment_name,
)
UPDATE 1:
Tried to create my own StepSequence
(very naive) with dummies input/outputs
class CommandSequence:
def __init__(self, commands, ml_client):
self.commands = commands
self.ml_client = ml_client
def build(self):
for i in range(len(self.commands)):
cmd = self.commands[i]
if i == 0:
cmd = command(
display_name=cmd.display_name,
description=cmd.description,
environment=cmd.environment,
command=cmd.command,
code=cmd.code,
is_deterministic=cmd.is_deterministic,
outputs=dict(
my_output=Output(type="uri_folder", mode="rw_mount"),
),
)
else:
cmd = command(
display_name=cmd.display_name,
description=cmd.description,
environment=cmd.environment,
command=cmd.command,
code=cmd.code,
is_deterministic=cmd.is_deterministic,
inputs=self.commands[i - 1].outputs.my_output,
outputs=dict(
my_output=Output(type="uri_folder", mode="rw_mount"),
),
)
cmd = self.ml_client.create_or_update(cmd.component)
self.commands[i] = cmd
print(self.commands[i])
return self.commands
I had to recreate command
because they protected a lot of stuff in the object...
@pipeline(compute="serverless")
def default_pipeline():
command_sequence = CommandSequence([step_1, step_2], ml_client).build()
return {
"my_pipeline": command_sequence[-1].outputs.my_output
}
But it fails to link the output of step 1 to input of step 2.
inputs=self.commands[i - 1].outputs.my_output, AttributeError: 'dict' object has no attribute 'my_output'
I made my own tools to recreate something that can acheive the same output.
I build a graph of steps (commands or node in Azure language) and then get the dependecy order of this graph and build the pipeline. This methods fits if you want to create a pipeline with sequencial or parallel steps. Which means, no custom Inputs/Outputs like Azure Machine Learning forces us to use to define the worflow logic. Some people, like me, just want to execute step 1 before step 2 with no data passing between them because the data is stored in a Database or a Azure Storage.
Basically
Step
can get the same arguments as thecommand
function fromazure.ai.ml.command
Here is how I use it:Here is the output for this example: