Linked Questions

Popular Questions

I have a script that calls an api (using Requests) to pull financial data for a list of stocks. It reads the data into pandas dataframes, does a transformation, then uploads the pulled data into postgres using psycopg2. There are 10-150 API calls that need to be made per ticker. Each iteration takes a few seconds.

Given that the list of tickers is over 2k long, this script takes about a day to run while the cpu is only utilized at 4% capacity. I want to change the script so that it uses AIOHTTP(https://aiohttp.readthedocs.io/en/stable/client_quickstart.html ) to make all the API needed for each ticker at one given time. Then as each request returns, it can be transformed and loaded into Postgres. My hope is that is will significantly cut down the time it takes to process each ticker by increasing the work done by the CPU.

I've looked at the documentation for aiohttp and async-await but I'm having a hard time wrapping my head around how to structure an asynchronous loop. I also am unsure how to make sure that, as each API request returns, it immediately kicks off the pandas/postgres upload instead of waiting for all API calls to return before moving on.

#here is a simplified example of the code for one ticker
import asyncio
import json
import pandas as pd
import psycopg2 as pg
import Requests as rq


tkr = foo
api_key = 'bar'
url = http://api_call_website.com?{0}&{1}&{2}&{3}&api-key:{4}
list_of_stmnts = 
[(True, 2009, 'Q4', 'pl'),
 (True, 2018, 'Q3', 'cf'),
 (True, 2018, 'Q2', 'cf'),
 (True, 2017, 'Q4', 'cf')]

#these "statements" contain the parameters that get passed into the url

async def async_get_loop(list_of_stmnts, tkr, url, api_key):
    urls = [url.format(tkr, stmt[1],stmt[2],stmt[3], api_key) for stmt in list_of_stmnts]
    #this builds the list of urls that need to be called
    await data = rq.request("GET", url)
    results = data.json()
    df = pd.DataFrame(results['values'])
    df.to_sql('table_name', engine, schema='schema', if_exists='append', index = False)
    #the postgres engine is defined earlier in the code using psycopg2
    return

This shows my rudimentary grasp of how async await should work. I know that to make it asynchronous, I need to implement aiohttp instead of Requests. But frankly I'm lost as to how I use these two packages.

Related Questions