I am trying to migrate Pipelines from Azure Machine Learning SDK V1 to V2, but sometimes I don't understand the logic behind the V2 and I get stuck.
In V1, I just had to create PythonScriptStep and wrap it into a StepSequence and deploy the pipeline. My scripts are simple, no input, no outputs. We store data in ADLS Gen2 and use databricks tables as inputs. This is why I don't have any inputs/outputs.
script_step_1 = PythonScriptStep(
    name="step1",
    script_name="main.py",
    arguments=arguments, # list of PipelineParameter
    compute_target=ComputeTarget(workspace=ws, name="cpu-16-128"),
    source_directory="./my_project_folder",
    runconfig=runconfig, # Conda + extra index url + custom dockerfile
    allow_reuse=False,
)
script_step_2 = PythonScriptStep(
    name="step2",
    ...
)
step_sequence = StepSequence(
    steps=[
        script_step_1,
        script_step_2,
    ]
)
# Create Pipeline
pipeline = Pipeline(
    workspace=ws,
    steps=step_sequence,
)
pipeline_run = experiment.submit(pipeline)
With V2, we need to create a "node" in a component that will be use by a pipeline.
I've made my Environment with dockerfile with BuildContext, and feed a representation of requirements.txt to a conda environment dictionary where I added my extra index url.
azureml_env = Environment(
    build=BuildContext(
        path="./docker_folder", # With Dockerfile and requirements.txt
    ),
    name="my-project-env",
)
Now I make a command component that will invoke python and a script with some arguments:
step_1 = command(
       environment=azureml_env ,
       command="python main.py",
       code="./my_project_folder",
   )
Now that I have my step1 and step2 in SDK V2, I have no clue on how to make a sequence without Input/Output
@pipeline(compute="serverless")
def default_pipeline():
    return {
        "my_pipeline": [step_1, step_2]
    }
I can not manage to make the pipeline work to make a basic run a 2 consecutive steps.
I guess after I manage to get this right, I can create/update the pipeline like this:
my_pipeline = default_pipeline()
# submit the pipeline job
pipeline_job = ml_client.jobs.create_or_update(
    my_pipeline,
    experiment_name=experiment_name,
)
UPDATE 1:
Tried to create my own StepSequence (very naive) with dummies input/outputs
class CommandSequence:
    def __init__(self, commands, ml_client):
        self.commands = commands
        self.ml_client = ml_client
    def build(self):
        for i in range(len(self.commands)):
            cmd = self.commands[i]
            if i == 0:
                cmd = command(
                    display_name=cmd.display_name,
                    description=cmd.description,
                    environment=cmd.environment,
                    command=cmd.command,
                    code=cmd.code,
                    is_deterministic=cmd.is_deterministic,
                    outputs=dict(
                        my_output=Output(type="uri_folder", mode="rw_mount"),
                    ),
                )
            else:
                cmd = command(
                    display_name=cmd.display_name,
                    description=cmd.description,
                    environment=cmd.environment,
                    command=cmd.command,
                    code=cmd.code,
                    is_deterministic=cmd.is_deterministic,
                    inputs=self.commands[i - 1].outputs.my_output,
                    outputs=dict(
                        my_output=Output(type="uri_folder", mode="rw_mount"),
                    ),
                )
            cmd = self.ml_client.create_or_update(cmd.component)
            self.commands[i] = cmd
            print(self.commands[i])
        return self.commands
I had to recreate command because they protected a lot of stuff in the object...
@pipeline(compute="serverless")
def default_pipeline():
    command_sequence = CommandSequence([step_1, step_2], ml_client).build()
    return {
        "my_pipeline": command_sequence[-1].outputs.my_output
    }
But it fails to link the output of step 1 to input of step 2.
inputs=self.commands[i - 1].outputs.my_output, AttributeError: 'dict' object has no attribute 'my_output'
 
                        
I made my own tools to recreate something that can acheive the same output.
I build a graph of steps (commands or node in Azure language) and then get the dependecy order of this graph and build the pipeline. This methods fits if you want to create a pipeline with sequencial or parallel steps. Which means, no custom Inputs/Outputs like Azure Machine Learning forces us to use to define the worflow logic. Some people, like me, just want to execute step 1 before step 2 with no data passing between them because the data is stored in a Database or a Azure Storage.
Basically
Stepcan get the same arguments as thecommandfunction fromazure.ai.ml.commandHere is how I use it:Here is the output for this example: