How can I get DAG of Spark Sql Query execution plan?

Question

How can I get DAG of Spark Sql Query execution plan?

1.5k views Asked by akash patel At 02 October 2020 at 13:13

I am doing some analysis on spark sql query execution plans. the execution plans that explain() api prints are not much readable. If we see spark web UI, a DAG graph is created which is divided into jobs, stages and tasks and much more readable. Is there any way to create that graph from execution plans or any apis in the code? if not, are there any apis that can read that grap from UI?

Original Q&A

There are 1 answers

**Liangjun** · Answer 1 · 2020-10-28T15:35:41+00:00

As close I can see, this project (https://github.com/AbsaOSS/spline-spark-agent) is able to interpret the execution plan and generate it in a readable way. This spark job is reading a file, convert it to a CSV file, write to local.

A sample output in JSON look like

{
    "id": "3861a1a7-ca31-4fab-b0f5-6dbcb53387ca",
    "operations": {
        "write": {
            "outputSource": "file:/output.csv",
            "append": false,
            "id": 0,
            "childIds": [
                1
            ],
            "params": {
                "path": "output.csv"
            },
            "extra": {
                "name": "InsertIntoHadoopFsRelationCommand",
                "destinationType": "csv"
            }
        },
        "reads": [
            {
                "inputSources": [
                    "file:/Users/liajiang/Downloads/spark-onboarding-demo-application/src/main/resources/wikidata.csv"
                ],
                "id": 2,
                "schema": [
                    "6742cfd4-d8b6-4827-89f2-4b2f7e060c57",
                    "62c022d9-c506-4e6e-984a-ee0c48f9df11",
                    "26f1d7b5-74a4-459c-87f3-46a3df781400",
                    "6e4063cf-4fd0-465d-a0ee-0e5c53bd52b0",
                    "2e019926-3adf-4ece-8ea7-0e01befd296b"
                ],
                "params": {
                    "inferschema": "true",
                    "header": "true"
                },
                "extra": {
                    "name": "LogicalRelation",
                    "sourceType": "csv"
                }
            }
        ],
        "other": [
            {
                "id": 1,
                "childIds": [
                    2
                ],
                "params": {
                    "name": "`source`"
                },
                "extra": {
                    "name": "SubqueryAlias"
                }
            }
        ]
    },
    "systemInfo": {
        "name": "spark",
        "version": "2.4.2"
    },
    "agentInfo": {
        "name": "spline",
        "version": "0.5.5"
    },
    "extraInfo": {
        "appName": "spark-spline-demo-application",
        "dataTypes": [
            {
                "_typeHint": "dt.Simple",
                "id": "f0dede5e-8fe1-4c22-ab24-98f7f44a9a5a",
                "name": "timestamp",
                "nullable": true
            },
            {
                "_typeHint": "dt.Simple",
                "id": "dbe1d206-3d87-442c-837d-dfa47c88b9c1",
                "name": "string",
                "nullable": true
            },
            {
                "_typeHint": "dt.Simple",
                "id": "0d786d1e-030b-4997-b005-b4603aa247d7",
                "name": "integer",
                "nullable": true
            }
        ],
        "attributes": [
            {
                "id": "6742cfd4-d8b6-4827-89f2-4b2f7e060c57",
                "name": "date",
                "dataTypeId": "f0dede5e-8fe1-4c22-ab24-98f7f44a9a5a"
            },
            {
                "id": "62c022d9-c506-4e6e-984a-ee0c48f9df11",
                "name": "domain_code",
                "dataTypeId": "dbe1d206-3d87-442c-837d-dfa47c88b9c1"
            },
            {
                "id": "26f1d7b5-74a4-459c-87f3-46a3df781400",
                "name": "page_title",
                "dataTypeId": "dbe1d206-3d87-442c-837d-dfa47c88b9c1"
            },
            {
                "id": "6e4063cf-4fd0-465d-a0ee-0e5c53bd52b0",
                "name": "count_views",
                "dataTypeId": "0d786d1e-030b-4997-b005-b4603aa247d7"
            },
            {
                "id": "2e019926-3adf-4ece-8ea7-0e01befd296b",
                "name": "total_response_size",
                "dataTypeId": "0d786d1e-030b-4997-b005-b4603aa247d7"
            }
        ]
    }
}

TechQA.

How can I get DAG of Spark Sql Query execution plan?

There are 1 answers

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in EXPLAIN

Related Questions in SPARK-UI

Popular Questions

Trending Questions