Taking OpenCorporate API data into a structured CSV

834 views Asked by At

I'm currently struggling with figuring out how to use pandas to scrape data off of the OpenCorporate API and insert it into a CSV file. I'm not quite sure where I'm messing up.

import pandas as pd
df = pd.read_json('https://api.opencorporates.com/companies/search?q=pwc')
data = df['companies']['company'][0]
result = {'name':data['timestamp'],
      'company_number':data[0]['company_number'],
      'jurisdiction_code':data[0]['jurisdiction_code'],
      'incorporation_date':data[0]['incorporation_date'],
      'dissolution_date':data[0]['dissolution_date'],
      'company_type':data[0]['company_type'],
      'registry_url':data[0]['registry_url'],
      'branch':data[0]['branch'],
      'opencorporates_url':data[0]['opencorporates_url'],
      'previous_names':data[0]['previous_names'],
      'source':data[0]['source'],
      'url':data[0]['url'],
      'registered_address':data[0]['registered_address'],
     }
df1 = pd.DataFrame(result, columns=['name', 'company_number', 'jurisdiction_code', 'incorporation_date', 'dissolution_date', 'company_type', 'registry_url', 'branch', 'opencorporates_url', 'previous_names', 'source', 'url', 'registered_address'])
df1.to_csv('company.csv', index=False, encoding='utf-8')
2

There are 2 answers

1
user3471881 On BEST ANSWER

Get the json data with requests and then use pd.io.json.json_normalize to flatten the response.

import requests

json_data = requests.get('https://api.opencorporates.com/companies/search?q=pwc').json()

from pandas.io.json import json_normalize

df = None
for row in json_data["results"]["companies"]:
    if df is None:
        df = json_normalize(row["company"])
    else:
        df = pd.concat([df, json_normalize(row["company"])])

You then write the DataFrame to a csv using the df.to_csv() method as described in the question.

0
Mollie Hanley On

It might be easier for you to access to the OpenCorporates database in bulk.

OpenCorporates provides access for commercial users under a closed licence, and as open data for journalists, academics and NGOs who are able to share the results under a share-alike, open data licence. The licence is available here: https://opencorporates.com/info/licence