I am having problems while converting the json file to the dataframe. The json file structure is like this:
"results": [
{
"submissions": [
{
"submission_type": "SUPPL",
"submission_number": "26",
"submission_status": "AP",
"submission_status_date": "20110902",
"submission_class_code": "LABELING",
"submission_class_code_description": "Labeling",
"application_docs": [
{
"id": "39507",
"url": "http://www.accessdata.fda.gov/drugsatfda_docs/label/2011/076175s026lbl.pdf",
"date": "20120516",
"type": "Label"
}
]
},
{
"submission_type": "SUPPL",
"submission_number": "30",
"submission_status": "AP",
"submission_status_date": "20130726",
"review_priority": "STANDARD",
"submission_class_code": "LABELING",
"submission_class_code_description": "Labeling",
"application_docs": [
{
"id": "39508",
"url": "http://www.accessdata.fda.gov/drugsatfda_docs/label/2013/076175s030lbl.pdf",
"date": "20130729",
"type": "Label"
}
]
},
{
"submission_type": "ORIG",
"submission_number": "1",
"submission_status": "AP",
"submission_status_date": "20020220",
"application_docs": [
{
"id": "18441",
"url": "http://www.accessdata.fda.gov/drugsatfda_docs/label/2002/76175_Mefloquine Hydrochloride_Prntlbl.pdf",
"date": "20031224",
"type": "Label"
},
{
"id": "22542",
"url": "http://www.accessdata.fda.gov/drugsatfda_docs/anda/2002/076175_mefloquine_toc.cfm",
"date": "20030804",
"type": "Review"
},
{
"id": "31095",
"url": "http://www.accessdata.fda.gov/drugsatfda_docs/appletter/2002/76175.ap.pdf",
"date": "20030411",
"type": "Letter"
}
]
},
{
],
"application_number": "ANDA076175",
"sponsor_name": "SANDOZ",`your text`
"products": [
{
"product_number": "001",
"reference_drug": "No",
"brand_name": "MEFLOQUINE HYDROCHLORIDE",
"active_ingredients": [
{
"name": "MEFLOQUINE HYDROCHLORIDE",
"strength": "250MG"
}
],
"reference_standard": "No",
"dosage_form": "TABLET",
"route": "ORAL",
"marketing_status": "Discontinued"
}
]
},
So far the code I wrote is :
**df_flattened = pd.json_normalize(data=rawData['results'])
df_flattened.tail()**
then to further normalise the data I am trying to do it on submissions and product colum :
**df_submissions = pd.json_normalize(rawData, record_path = rawData['results']['submissions'], meta = ['application_name', 'sponsor_name'])
df_submissions.head()**
but I am getting error saying:
TypeError Traceback (most recent call last) in ----> 1 df_submissions = pd.json_normalize(rawData, record_path = rawData['results']['submissions'], meta = ['application_name', 'sponsor_name']) 2 df_submissions.head() 3 TypeError: list indices must be integers or slices, not str
I am not able to convert the nested list of dictionaries in submissions and products. submission column and product column are nested list with dictionaries json_normalize doesn't work with them. I tried using it but was getting error. How can I convert this json file in to dataframe?
Any input on this will be helpful
I come to know across this library which reduces a lots of code and flatten the nested json objects to lowest level and make separated columns. For more information click here.
Output: