Is there any way to inject "Resources" memory values for a pipeline in Data Fusion?

107 views Asked by At

I'm trying to automate some pipeline executions in Google Cloud Data Fusion (we are using 6.1.4 and 6.4.0 at this moment). At this moment we are injecting some "runtime args" into DF through a PUT API call. My question is about inyecting parameters to modify Configure section. For example, we are currently using "system.profile.name" parameter to tell this pipeline to use a specific profile, see: Screenshot of Runtime Arguments.

I'm wondering if there is any similar configuration option to define "Configure/Resources/Executor Memory" label: Screenshot of "Configure/Resources" label. I know that this can be configured by hand by modifying the UI or by setting a different value in the pipeline template (json) before importing the pipeline. But I would like to know if there is any way to automate this once the pipeline is deployed (I do not want to re-deploy the pipeline each time I want to modify this).

Thanks in advance!

1

There are 1 answers

2
Venudhar Ravishankar On

I don't believe the same can be set as preferences, but you can use CDAP's PUT API to set this at pipeline upload time, or to update a pipeline:

PUT /v3/namespaces/<namespace-id>/apps/<pipeline-name>

{
    "name": "<pipeline-name>",
    "description": "Data Pipeline Application",
    "artifact": {
        "name": "cdap-data-pipeline",
        "version": "[6.1.1,7.0.0)",
        "scope": "SYSTEM"
    },
    "config": {
        "resources": {
            "memoryMB": 9999,
            "virtualCores": 9
        },
        "driverResources": {
            "memoryMB": 9999,
            "virtualCores": 9
        },
    ...
    }
...
}

The uploaded JSON should be your entire pipeline, but with the driver and executor resources set according to your preferences. This should be much easier to implement automation than using the UI each time Please let me know if you have more questions.