pyspark json to dataframe schema

51 views Asked by At

i have tricky json which i would like to load into a dataframe and need assistance on how i may be able to define a schema

{
    "1-john": {
        "children": ["jack", "jane", "jim"]
    },
    "2-chris": {
        "children": ["bill", "will"]
    }
}

dataframe output needed

ID father children
1 john ["jack", "jane", "jim"]
2 chris ["bill", "will"]
1

There are 1 answers

0
keramat On

In the case of pandas, Use:

import json
t = json.dumps(d)
df = pd.read_json(t, orient = 'index')
ids = df.reset_index()['index'].str.split('-').str[0]
fathers = df.reset_index()['index'].str.split('-').str[1]
df['ID']=ids
df['fathers'] = fathers

You can then convert this to pyspark df:

df_sp = spark_session.createDataFrame(df)