Flattening the nested json file to dataframe using pandas json_normalise

Question

Flattening the nested json file to dataframe using pandas json_normalise

228 views Asked by Nano At 03 November 2023 at 13:46

I hace a big json file data and I want to convert it in to tabular form. I am trying to flatten the data in to dataframe using json_nomalise. so Far I have this :

code so far

I want to further flatten the submissions and product data in columns i tried this:

submission_data = pd.json_normalize(data=rawData['results'], record_path=rawData['results']['submissions'], meta=['application_number', 'sponsor_name'] , errors='ignore') submission_data.head(3)

But I am getting error saying: TypeError: list indices must be integers or slices, not str

Any output on this will be helpful

Original Q&A

There are 1 answers

**Lourenço Monteiro Rodrigues** · Answer 1 · 2023-11-03T14:24:49+00:00

As submissions and Products are lists (and not objects with a regular structure), JSON_normalize will leave them untouched. Also, given that they are lists, can you make sure that they are always the same number for each record? If not, distributing them trough columns makes no sense. If submissions and products are pairs (i.e. if every submission corresponds to one product) you can consider distributing along lines (In a melting dataframe strategy).

finally, regarding the error, raw_data seems to be a list of objects that contain a 'results' field. That means you cannot retrieve directly raw_data['results'], but only raw_data[0]['results'] to get the results from the first object.

Adding a solution proposition

Given your data structure, what I would do is the following:

normalize the raw_data as you do in the notebook.
for each line of the resulting dataframe: a. normalize the json in 'submissions' field b. change the column names of that resulting dataframe to 'submissions.<column_name>'. c. add a column with value equal to the application number of the line you are evaluating. d. add that resulting df to a list, collecting all such dataframes
concatenate those dataframes
merge the original dataframe with the concatenated one using 'application_number' as the key, and drop the submissions column.

Repeat the process for the 'products'; however, unless you know the relationship between submissions and products, there is no clear way of merging the dataframes you get:

If they have no relationship except for being under the same application number, you basically get separate datasets for each.
If there is a one-to-one relationship, you can just merge them by index (concatenate each line)

in code:

df = pd.normalize_json(raw_data)

submissions = []
products = []

for i, line in df.iterrows():
    temp_df_sub = pd.normalize_json(line['submissions'])
    temp_df_sub.cols = list(map(lambda x: f'submissions.{x}', temp_df_sub)
    temp_df_sub['application_number'] = line['application_number']
    submissions.append(temp_df_sub)

    temp_df_prod = pd.normalize_json(line['products'])
    temp_df_prod.cols = list(map(lambda x: f'products.{x}', temp_df_sub)
    temp_df_prod['application_number'] = line['application_number']
    products.append(temp_df_prod)

submissions_df = pd.concat(submissions)
products_df = pd.concat(products)


# if one-to-one relationship between submissions and products
sub_prod_df = pd.concat([submissions_df, products_df], axis=1)
final_df = df.merge(sub_prod_df, on='application_number')


# if no relationship
final_sub_df = submissions_df.merge(df, on='application_number')
final_prod_df = products_df.merge(df, on='application_number')

TechQA.

Flattening the nested json file to dataframe using pandas json_normalise

There are 1 answers

Adding a solution proposition

Related Questions in PYTHON

Related Questions in JSON

Related Questions in DATAFRAME

Related Questions in FLATTEN

Related Questions in JSON-NORMALIZE

Popular Questions

Popular Tags

Trending Questions