I understand that the GitHub Search API limits to 1000 results and 100 results per page. Therefore I wrote the following to view all 1000 results for a code search process that looks for a string torch
-
import requests
for i in range(1,11):
url = "https://api.github.com/search/code?q=torch +in:file + language:python&per_page=100&page="+str(i)
headers = {
'Authorization': 'xxxxxxxx'
}
response = requests.request("GET", url, headers=headers).json()
try:
print(len(response['items']))
except:
print("response = ", response)
Here is the output -
15
62
response = {'documentation_url': 'https://docs.github.com/en/free-pro-team@latest/rest/overview/resources-in-the-rest-api#secondary-rate-limits', 'message': 'You have exceeded a secondary rate limit. Please wait a few minutes before you try again.'}
- It seems to hit the secondary rate limit just after the second iteration
- The values in the pages aren't consistent. For instance, page 1 shows 15 results when I ran this time. However, if I run it again, it will be another number. I believe there should be 100 results per page.
Does there exist an efficient way to get all 1000 results from the Search API?
There's two things happening here:
The search API has different rate limits. See the GitHub Documentation:
I would recommend trying lower amounts of results per page to solve the incomplete results.
You will also need to be very deliberate about the requests you're making, because the limits are low. Getting the full 1000 may be impossible without requesting a rate increase or a implementing a very long backoff.
I modified your code to add a primitive exponential backoff, but this still doesn't produce the full 1000 results and takes a while: