pyspark json to dataframe schema

Question

pyspark json to dataframe schema

47 views Asked by RData At 27 April 2022 at 02:40

i have tricky json which i would like to load into a dataframe and need assistance on how i may be able to define a schema

{
    "1-john": {
        "children": ["jack", "jane", "jim"]
    },
    "2-chris": {
        "children": ["bill", "will"]
    }
}

dataframe output needed

ID	father	children
1	john	["jack", "jane", "jim"]
2	chris	["bill", "will"]

Original Q&A

There are 1 answers

**keramat** · Answer 1 · 2022-04-27T02:51:29+00:00

In the case of pandas, Use:

import json
t = json.dumps(d)
df = pd.read_json(t, orient = 'index')
ids = df.reset_index()['index'].str.split('-').str[0]
fathers = df.reset_index()['index'].str.split('-').str[1]
df['ID']=ids
df['fathers'] = fathers

You can then convert this to pyspark df:

df_sp = spark_session.createDataFrame(df)

TechQA.

pyspark json to dataframe schema

There are 1 answers

Related Questions in JSON

Related Questions in DATAFRAME

Related Questions in PYSPARK

Related Questions in PYSPARK-SCHEMA

Popular Questions

Popular Tags

Trending Questions