Python aiohttp Azure Function ClientConnector Error (works locally)

333 views Asked by At

I'm working on a TimerTrigger Azure Function in python that makes pretty heavy use of the aiohttp library to make concurrent requests to a file cache, grab ~8K JSON files, and prepare them to be loaded into a database. I have been able to run the process end-to-end without issue on my local machine (OSX). That is to say, with Azure Functions Core Tools, I've been able to func start the process, start the job with a POST request to http://localhost:7071/admin/functions/NameOfMyFunction, and have everything work just fine.

However, when I publish this function to my Azure Functions App, the TimerTrigger kicks off as expected, but somewhere not too far into the process of "concurrently fetching the JSON files," the function execution fails with this error (I've redacted the actual url and IP address I'm hitting for confidentiality reasons):

Result: Failure Exception: ClientConnectorError: Cannot connect to host https://FILE-CACHE-URL:443 ssl:default [Connect call failed ('XX.XXX.XXX.XXX', 443)] Stack: File "/azure-functions-host/workers/python/3.8/LINUX/X64/azure_functions_worker/dispatcher.py", line 370, in _handle__invocation_request call_result = await self._loop.run_in_executor( File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/azure-functions

Here is a peek at some critical excerpts from the actual code that I am running

From the run.py entry point of the Azure Function

import asyncio
import azure.functions as func

from helpers.doctor_info import fetch_doctor_profiles


def main(myTimer: func.TimerRequest) -> None:
           
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)
    doc_profiles = loop.run_until_complete(fetch_doctor_profiles())

From the doctor_info.py file, a helper function that is imported the fetch the profiles: Given a big list of the ~8K files I need to grab from the Cache, this splits them into batches of 50, and fetches files from each batch concurrently, allowing for pauses in between.

async def fetch_doctor_profiles(batch_size = 50, max_concurrent_requests = 15,
    use_trust_env = True):
   
    json_list = []
    file_paths = get_cache_file_paths_from_manifest()
    path_batches = make_batches_of_paths(paths = file_paths, size = batch_size)
    sem = asyncio.Semaphore(max_concurrent_requests)
    connector = aiohttp.TCPConnector(verify_ssl=False)

    async with ClientSession(connector = connector, trust_env = use_trust_env) as session:
        json_batches = await asyncio.gather(*[fetch_jsons_in_batch(sem, session, batch) \
            for batch in path_batches])
    for jsons in json_batches:
        unpack_fetched_profiles(profile_list = jsons, out_list = json_list)
    return json_list

As you may be able to see in the above excerpt, I originally thought that this might be an SSL handshake issue, and have experimented with disabling SSL validation with no luck: things continue to work fine when hosted on my laptop, but break in Azure.

Given that this always works just fine locally but has never worked in deployment, I figure that the root of this issue is some difference in environment once this process is hosted in the cloud, but I'm at a bit of a loss at how to diagnose exactly what that difference is?

Happy to provide more detail, but figured this would be more than enough to start out. Thanks very much, and any help would be appreciated!

0

There are 0 answers