How to transform JSON International Financial Statistics into pandas data frame

441 views Asked by At

I am struggling with data from the International Monetary Fund, which is in JSON format. After inspecting some of the posts, I couldn't figure out how to do it.

What I tried

import requests
import pandas as pd
import json

# These are the variables I want to have as columns, plus setting a time index
var = ['NGDP_XDC', 'NCP_XDC', 'NCGG_XDC', 'NFI_XDC', 'NINV_XDC', 'NX_XDC', 
       'NM_XDC', 'NSDGDP_XDC', 'NGDP_R_K_IX', 'NGDP_D_IX']

# URL for the IMF JSON Restful Web Service,
# IFS database
base = 'http://dataservices.imf.org/REST/SDMX_JSON.svc/CompactData/IFS/'
period = 'A'
country = 'MX'

var = 'NGDP_XDC+NCP_XDC+NCGG_XDC+NFI_XDC+NINV_XDC+NX_XDC+NM_XDC+NSDGDP_XDC+NGDP_R_K_IX+NGDP_D_IX'
    
time = '?startPeriod=1970&endPeriod=2019'

# Get data from the above URL using the requests package
url = base + period + '.' + country + '.' + var + '.' + time

response = requests.get(url)
dictr = response.json()

... so far so good... However, this is the step I am struggling with


flat = dictr['CompactData']['DataSet']['Series']

temp = pd.json_normalize(flat)
temp = temp.drop(columns=['@FREQ', '@REF_AREA', '@UNIT_MULT', '@BASE_YEAR'])

I was expecting a flat-file that I could pivot to my will. However, this is what I get


    @INDICATOR @TIME_FORMAT                                                Obs
0     NINV_XDC          P1Y  [{'@TIME_PERIOD': '1970', '@OBS_VALUE': '37.21...
1       NX_XDC          P1Y  [{'@TIME_PERIOD': '1970', '@OBS_VALUE':

Which I have no clue how to transform it into

year variable1 ... variableN

1970    10     ...    45
1980    20     ...    12
. 
.
.
2019    15     ...    10
2

There are 2 answers

0
baduker On

Maybe this will nudge you in the right direction.

The value of ['CompactData']['DataSet']['Series'] is a dict that contains a list of dicts as its value that you're after.

So you have to flatten this:

series = response['CompactData']['DataSet']['Series']
flat = [item for sublist in [i['Obs'] for i in series] for item in sublist]

Putting it all together:

import requests
import pandas as pd

# These are the variables I want to have as columns, plus setting a time index
var = [
    'NGDP_XDC', 'NCP_XDC', 'NCGG_XDC', 'NFI_XDC', 'NINV_XDC', 'NX_XDC',
    'NM_XDC', 'NSDGDP_XDC', 'NGDP_R_K_IX', 'NGDP_D_IX',
]

base = 'http://dataservices.imf.org/REST/SDMX_JSON.svc/CompactData/IFS/'
period = 'A'
country = 'MX'
time = '?startPeriod=1970&endPeriod=2019'

# Get data from the above URL using the requests package
url = f"{base}{period}.{country}.{'+'.join(var)}.{time}"
response = requests.get(url).json()

series = response['CompactData']['DataSet']['Series']
flat = [item for sublist in [i['Obs'] for i in series] for item in sublist]
print(pd.DataFrame(flat))

Output:

    @TIME_PERIOD        @OBS_VALUE @OBS_STATUS
0           1970   37.210816346586         NaN
1           1971  35.6027864361386         NaN
2           1972   36.123021665698         NaN
3           1973  50.9603299629663         NaN
4           1974   80.992068185601         NaN
..           ...               ...         ...
[499 rows x 3 columns]
0
Jorge Alonso On

I implemented your nudge in a much less elegant way as I could not understand how to retrieve variable codes and time index from your procedure. This also works as well:

url = f"{base}{period}.{country}.{'+'.join(var)}.{time}"
response = requests.get(url).json()
series = response['CompactData']['DataSet']['Series']

nipa = pd.DataFrame(index=range(1970, 2020))
N = len(var)

for n in range(0, N):
    temp = pd.DataFrame(series[n]['Obs'], index=range(1970, 2020))
    temp = temp.drop(columns='@TIME_PERIOD')
    temp.rename(columns={'@OBS_VALUE': var[n]}, inplace=True)
    nipa = pd.merge(nipa, temp, left_index=True, right_index=True)