I am searching for GitHub files containing the string "torch." Since, the search API limits searches to the first 100 results, I am searching based on file sizes as suggested here. However, I keep hitting the secondary rate limit. Could someone suggest if I am doing something wrong or if there is a way to optimize my code to prevent these rate limits? I have already looked at best practices to deal with rate limits. Here is my code -
import os
import requests
import httplink
import time
# This for loop searches for code based on files sizes from 0 to 500000 containing the string "torch"
for i in range(0,500000,250):
print("i = ",i," i + 250 = ", i+250)
url = "https://api.github.com/search/code?q=torch +in:file + language:python+size:"+str(i)+".."+str(i+250)+"&page=1&per_page=10"
headers = {"Authorization": f'Token xxxxxxxxxxxxxxx'} ## Please put your token over here
# Backoff when secondary rate limit is reached
backoff = 256
total = 0
cond = True
# This while loop goes over all pages of results => Pagination
while cond==True:
try:
time.sleep(2)
res = requests.request("GET", url, headers=headers)
res.raise_for_status()
link = httplink.parse_link_header(res.headers["link"])
data = res.json()
for i, item in enumerate(data["items"], start=total):
print(f'[{i}] {item["html_url"]}')
if "next" not in link:
break
total += len(data["items"])
url = link["next"].target
# Except case to catch when secondary rate limit has been reached and prevent the computation from stopping
except requests.exceptions.HTTPError as err:
print("err = ", err)
print("err.response.text = ", err.response.text)
# backoff **= 2
print("backoff = ", backoff)
time.sleep(backoff)
# Except case to catch when the given file size provides no results
except KeyError as error:
print("err = ", error)
# Set cond to False to stop the while loop
cond = False
continue
Based on this answer, it seems like it is a common occurrence. However, I was hoping someone could suggest a workaround.
I have added the tag Octokit, although I am not using that, to increase visibility and since this seems like a common problem.
A big chunk of the above logic/code was obtained through SO answers, I highly appreciate all support from the community.
Note that search has its primary and secondary rate limiting that is lower than others. For JavaScript, we have a throttle plugin that implements all the recommended best practices. For search we limit requests to 1 per 2 seconds. Hope that helps!