I started learning Python this week, and as a first project I choose to develop a simple application that would fetch data from the Riot API, treat it, and then insert into a mySQL database. I managed to make it work synchronously with no problems, which is already enough for the project, since the API has a rate limit of 20 calls per second and 100 calls per 120 seconds, and in 40 seconds the rate limit was already reached.
However, i wanted to improve it, since there is a possibility of getting better keys that provide rate limit way above the one I'm attached.
I tried to work around with multi-threading, by setting up a thread executor with 5 workers, where each would do the fetching for each match, get a connection from a pool, insert the data on the DB and then get the next available match to fetch, but this approach didn't look promising, since it was inserting multiple times small chunks of data. My code was working with a single fetch function that received a URL that was generated from other functions, the code structure was something like this:
def fetch(url):
while True:
response = session.get(url, headers={"X-Riot-Token": f"{api_key}"})
if response.status_code == 200:
return response.json()
elif response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 1))
time.sleep(retry_after+1)
else:
return None
The structure is almost like that, with a try except inside the while loop. The while loop prevented a request to not be done, if it received a rate limit request it would only wait till the timeout was over.
With the multi-threading this worked, i also added the retry after as global and verified if it was rate limited before doing a request, but it wasn't fast enough and it still done some requests after the rate limit started, which can lead to a black list on this API.
I tried with the asyncio and aiohttp, and it was way faster, but both the 20 requests per second and the 100 per 120s were being reached, and the requests continued to run.
I saw about semaphores, but i didn't understood quite well how would it work, should i create two semaphores, one for each rate limiting, acquire each on the requests, do a asyncio.sleep and then release them accordingly? Would the next requests still occur? I saw other posts in here, however none of them covered how to work with two different rate limits.
The basic structure would be: Sync request to get a match list -> async request for each match to get the respective data -> async request to get info about each player of the match -> insert all into the database after treatment.
A semaphore would be more useful to restrict the number of simultaneous tasks. For rate limiting, you could create a simple counter that blocks when it reaches 0 and resets at the time limit. (I've also used Gubernator as an external service for rate limiting, which could be useful for more complex situations).
So, as a quick attempt, maybe something like:
To make it more robust, maybe there should be a queue used so that requests are always made in the same order.
It's probably also worth looking at existing libraries (a quick search finds aiolimiter and asynciolimiter), though I'm unsure how well these will play with multiple limits (e.g. if 2 limiters can be combined in a reliable manner).