How to normalise complex JSON with multiple levels of nested information in Python

Question

How to normalise complex JSON with multiple levels of nested information in Python

194 views Asked by cyancyan At 01 January 2025 at 10:27

I get the data in a predefined format I cannot change:

data = 
{
  "id": 12345,
  "name": "Doe",
  "gender": {
    "textEn": "Masculin"
  },
  "professions": [
    {
      "job_description": {
        "textEn": "Job description"
      },
      "cetTitles": [
        {
          "cetTitleType": {
            "textEn": "Recognition"
          },
          "issuanceDate": "1992-04-14T00:00:00Z",
          "phoneNumbers": [
            "123 221 00 70"
          ]
        }
      ]
    }
  ]
}

While I can normalise data one level down ('gender') through the pd.json_normalize function, I struggle to access info further down in the hierarchy.

I tried e.g. getting the job description through data = pd.json_normalize(data,record_path=['professions','job_description'],meta='id') but I get a TypeError. How can I address this and extract all data into a table so the data above is represented as one row with all content as fields?

The output I would like to ultimately achieve looks as follows:

id	name	gender	job_description	cetTitleType	issuanceDate	phoneNumbers
12345	Doe	Masculin	Job description	Recognition	1992-04-14T00:00:00Z	123 221 00 70

Original Q&A

There are 1 answers

**Jason Baker** · Accepted Answer · 2023-12-05T15:42:41+00:00

Using json_normalize():

Code:

df = pd.json_normalize(
    data=data,
    record_path=["professions", "cetTitles"],
    meta=["id", "name", "gender", ["professions", "job_description"]]
).explode(column="phoneNumbers")

df = df.join(pd.DataFrame(df.pop("gender").values.tolist()))
df.columns = df.columns.str.split(".").str[-1]
df = df.rename(columns={"textEn": "gender"})
df = df.join(pd.DataFrame(df.pop("job_description").values.tolist()))
df = df.rename(columns={"textEn": "job_description"})
print(df)

Output:

           issuanceDate   phoneNumbers       gender     id name    gender  job_description
0  1992-04-14T00:00:00Z  123 221 00 70  Recognition  12345  Doe  Masculin  Job description

TechQA.

How to normalise complex JSON with multiple levels of nested information in Python

There are 1 answers

Related Questions in PYTHON

Related Questions in JSON

Related Questions in PANDAS

Related Questions in JSON-NORMALIZE

Popular Questions

Popular Tags

Trending Questions