Im trying to accelerate multiple get requests to a web service using asyncio and aiohttp.

For that im fetching my data from a postgresql database using psycopg2 module .fetchmany() inside a function and constructing a dictionary of 100 records to send as lists of dictionary urls to an async function named batch() . batch by batch process.

The problem im facing in batch() function is that some requests are logging the message below although the script continues and dont fail but im not able to catch and log this exceptions to later reprocess them.

Task exception was never retrieved
future: <Task finished coro=<batch.<locals>.fetch() done, defined at C:/PythonProjects/bindings/batch_fetch.py:34> exception=ClientOSError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None)>
Traceback (most recent call last):
  File "C:/PythonProjects/bindings/batch_fetch.py", line 36, in fetch
    async with session.get(url) as resp:
  File "C:\Miniconda3\lib\site-packages\aiohttp\client.py", line 1005, in __aenter__
    self._resp = await self._coro
  File "C:\Miniconda3\lib\site-packages\aiohttp\client.py", line 497, in _request
    await resp.start(conn)
  File "C:\Miniconda3\lib\site-packages\aiohttp\client_reqrep.py", line 844, in start
    message, payload = await self._protocol.read()  # type: ignore  # noqa
  File "C:\Miniconda3\lib\site-packages\aiohttp\streams.py", line 588, in read
    await self._waiter
aiohttp.client_exceptions.ClientOSError: [WinError 10054] An existing connection was forcibly closed by the remote host
Task exception was never retrieved
future: <Task finished coro=<batch.<locals>.fetch() done, defined at C:/PythonProjects/bindings/batch_fetch.py:34> exception=ClientConnectorError(10060, "Connect call failed ('xx.xxx.xx.xxx', 80)")>
Traceback (most recent call last):
  File "C:\Miniconda3\lib\site-packages\aiohttp\connector.py", line 924, in _wrap_create_connection
    await self._loop.create_connection(*args, **kwargs))
  File "C:\Miniconda3\lib\asyncio\base_events.py", line 778, in create_connection
    raise exceptions[0]
  File "C:\Miniconda3\lib\asyncio\base_events.py", line 765, in create_connection
    yield from self.sock_connect(sock, address)
  File "C:\Miniconda3\lib\asyncio\selector_events.py", line 450, in sock_connect
    return (yield from fut)
  File "C:\Miniconda3\lib\asyncio\selector_events.py", line 480, in _sock_connect_cb
    raise OSError(err, 'Connect call failed %s' % (address,))
TimeoutError: [Errno 10060] Connect call failed ('xx.xxx.xx.xxx', 80)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:/PythonProjects/bindings/batch_fetch.py", line 36, in fetch
    async with session.get(url) as resp:
  File "C:\Miniconda3\lib\site-packages\aiohttp\client.py", line 1005, in __aenter__
    self._resp = await self._coro
  File "C:\Miniconda3\lib\site-packages\aiohttp\client.py", line 476, in _request
    timeout=real_timeout
  File "C:\Miniconda3\lib\site-packages\aiohttp\connector.py", line 522, in connect
    proto = await self._create_connection(req, traces, timeout)
  File "C:\Miniconda3\lib\site-packages\aiohttp\connector.py", line 854, in _create_connection
    req, traces, timeout)
  File "C:\Miniconda3\lib\site-packages\aiohttp\connector.py", line 992, in _create_direct_connection
    raise last_exc
  File "C:\Miniconda3\lib\site-packages\aiohttp\connector.py", line 974, in _create_direct_connection
    req=req, client_error=client_error)
  File "C:\Miniconda3\lib\site-packages\aiohttp\connector.py", line 931, in _wrap_create_connection
    raise client_error(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host cms-uat.cme.in.here.com:80 ssl:None [Connect call failed ('xx.xxx.xx.xxx', 80)]

Im just entering into asyncio world as you can depict from my code, so all the advises on the full code approach for this scenario are very welcomme.

Thank you

full code below.

import psycopg2.extras
import asyncio
import json
from aiohttp import ClientSession
from aiohttp import TCPConnector

base_url = 'http://url-example/{}'

def query_db():
    urls = []
    # connection to postgres table , fetch data.
    conn = psycopg2.connect("dbname='pac' user='user' host='db'")
    cursor = conn.cursor('psycopg2 request', cursor_factory=psycopg2.extras.NamedTupleCursor)
    sql = "select gid, paid from table"
    cursor.execute(sql)
    while True:
        rec = cursor.fetchmany(100)

        for item in rec:
            record = {"gid": item.gid, "url": base_url.format(item.paid)}
            urls.append(record.get('url'))
        if not rec:
            break
        # send batch for async batch request
        batch(urls)
        # empty list of urls for new async batch request
        urls = []


def batch(urls):
    async def fetch(url):
        async with ClientSession() as session:
            async with session.get(url) as resp:
                if resp.status == 200:
                    response = await resp.json()
                    # parse the url to fetch the point address id.
                    paid = str(resp.request_info.url).split('/')[4].split('?')[0]
                    # build the dictionary with pa id and full response.
                    resp_dict = {'paid': paid, 'response': response}
                    with open('sucessful.json', 'a') as json_file:
                        json.dump(resp_dict, json_file)
                        json_file.write("\n")
                elif resp.status is None:
                    print(resp.status)
                elif resp.status != 200:
                    print(resp.status)
                    response = await resp.json()
                    # parse the url to fetch the paid.
                    paid = str(resp.request_info.url).split('/')[4].split('?')[0]
                    # build the dictionary with paid and full response.
                    resp_dict = {'paid': paid, 'response': response}
                    with open('failed.json', 'a') as json_file:
                        json.dump(resp_dict, json_file)
                        json_file.write("\n")

    loop = asyncio.get_event_loop()

    tasks = []

    for url in urls:
        task = asyncio.ensure_future(fetch(url))
        tasks.append(task)
    try:
        loop.run_until_complete(asyncio.wait(tasks))
    except Exception:
        print("exception consumed")


if __name__ == "__main__":
    query_db()

1 Answers

1
Mikhail Gerasimov On

Task exception was never retrieved

You see this warning when you've created some task, it finished with exception, but you never explicitly retrieved (awaited) for its result. Here's related doc section.

I bet in your case problem is with the line

loop.run_until_complete(asyncio.wait(tasks))

asyncio.wait() by default just waits when all tasks are done. It doesn't distinguish tasks finished normally or with exception, it just blocks until everything finished. In this case it's you job to retrieve exceptions from finished tasks and following part won't help you with this since asyncio.wait() will never raise an error:

try:
    loop.run_until_complete(asyncio.wait(tasks))
except Exception:
    print('...')  # You will probably NEVER see this message

If you want to catch error as soon as it happened in one of tasks I can advice you to use asyncio.gather(). By default it will raise first happened exception. Note however that it is you job to cancel pending tasks if you want their graceful shutdown.