Convert nested and dynamic Json structure to spark data frame

39 views Asked by At

As date engineer I want to create dataframe from dynamic nested json in python . Below is the structure of json

f is nested and can have multiple key value within it similar to j. It can have multiple list within these sections

 {
      "a": "a",
      "b": "b",
      "c": "c",
      "d": "d",
      "e": true,
      "f": [
        { "f.1": "f1", "f.2": "f22", "f.3": "f3" },
        { "f.1": "f11", "f.2": "f22", "f.3": "f33" }
    .
    .
    .
      ],
      "g": [
        {
          "g.1": "g1",
          "g.2": [
            {
              "g.2.1": { "g.2.1.1": "g211", "g.2.1.2": "g212" },
              "g.2.2": [
                {
                  "g.2.2.1": "g221",
                  "g.2.2.2": "g222"
                }
              ]
            }
          ]
        }
      ],
      "h": [],
      "i": [],
      "j": [
        {
          "j.1": "j1",
          "j.2": "j2",
          "j.3": "j3",
          "j.4": "j4",
          "j.5": "j5"
        },
        {
            "j.1": "j11",
            "j.2": "j22",
            "j.3": "j33",
            "j.4": "j44",
            "j.5": "j55"
          }
    .
    .
    .
      ]
    }

Also if value is null or blank I still want to have that key as column in data frame like key h. If f has multiple key value, then a, b, c etc should also repeat in data frame. I want data in below format

enter image description here

0

There are 0 answers