Flattening of json file

90 views Asked by At

I am having problems while converting the json file to the dataframe. The json file structure is like this:

"results": [
    {
      "submissions": [
        {
          "submission_type": "SUPPL",
          "submission_number": "26",
          "submission_status": "AP",
          "submission_status_date": "20110902",
          "submission_class_code": "LABELING",
          "submission_class_code_description": "Labeling",
          "application_docs": [
            {
              "id": "39507",
              "url": "http://www.accessdata.fda.gov/drugsatfda_docs/label/2011/076175s026lbl.pdf",
              "date": "20120516",
              "type": "Label"
            }
          ]
        },
        {
          "submission_type": "SUPPL",
          "submission_number": "30",
          "submission_status": "AP",
          "submission_status_date": "20130726",
          "review_priority": "STANDARD",
          "submission_class_code": "LABELING",
          "submission_class_code_description": "Labeling",
          "application_docs": [
            {
              "id": "39508",
              "url": "http://www.accessdata.fda.gov/drugsatfda_docs/label/2013/076175s030lbl.pdf",
              "date": "20130729",
              "type": "Label"
            }
          ]
        },
        {
          "submission_type": "ORIG",
          "submission_number": "1",
          "submission_status": "AP",
          "submission_status_date": "20020220",
          "application_docs": [
            {
              "id": "18441",
              "url": "http://www.accessdata.fda.gov/drugsatfda_docs/label/2002/76175_Mefloquine Hydrochloride_Prntlbl.pdf",
              "date": "20031224",
              "type": "Label"
            },
            {
              "id": "22542",
              "url": "http://www.accessdata.fda.gov/drugsatfda_docs/anda/2002/076175_mefloquine_toc.cfm",
              "date": "20030804",
              "type": "Review"
            },
            {
              "id": "31095",
              "url": "http://www.accessdata.fda.gov/drugsatfda_docs/appletter/2002/76175.ap.pdf",
              "date": "20030411",
              "type": "Letter"
            }
          ]
        },
        {
     
      ],
      "application_number": "ANDA076175",
      "sponsor_name": "SANDOZ",`your text`
      "products": [
        {
          "product_number": "001",
          "reference_drug": "No",
          "brand_name": "MEFLOQUINE HYDROCHLORIDE",
          "active_ingredients": [
            {
              "name": "MEFLOQUINE HYDROCHLORIDE",
              "strength": "250MG"
            }
          ],
          "reference_standard": "No",
          "dosage_form": "TABLET",
          "route": "ORAL",
          "marketing_status": "Discontinued"
        }
      ]
    },

So far the code I wrote is :

   **df_flattened = pd.json_normalize(data=rawData['results'])

     df_flattened.tail()**

then to further normalise the data I am trying to do it on submissions and product colum :

**df_submissions = pd.json_normalize(rawData, record_path = rawData['results']['submissions'], meta = ['application_name', 'sponsor_name'])

df_submissions.head()**

but I am getting error saying:

TypeError Traceback (most recent call last) in ----> 1 df_submissions = pd.json_normalize(rawData, record_path = rawData['results']['submissions'], meta = ['application_name', 'sponsor_name']) 2 df_submissions.head() 3 TypeError: list indices must be integers or slices, not str

I am not able to convert the nested list of dictionaries in submissions and products. submission column and product column are nested list with dictionaries json_normalize doesn't work with them. I tried using it but was getting error. How can I convert this json file in to dataframe?

Any input on this will be helpful

1

There are 1 answers

0
Chinmay T On

I come to know across this library which reduces a lots of code and flatten the nested json objects to lowest level and make separated columns. For more information click here.

#https://pypi.org/project/flatten-json/
from flatten_json import flatten
with open('.\\Results.Json') as json_data:
    data = json.load(json_data)

print(data)

dic_flattened = [flatten(d) for d in data['results']]     
df = pd.DataFrame(dic_flattened)

print(df)

Output:

enter image description here