Python append json to json file in a while loop

3k views Asked by At

I'm trying to get all users information from GitHub API using Python Requests library. Here is my code:

import requests
import json

url = 'https://api.github.com/users'
token = "my_token"
headers = {'Authorization': 'token %s' % token}

r = requests.get(url, headers=headers)
users = r.json()
with open('users.json', 'w') as outfile:
    json.dump(users, outfile)

I can dump first page of users into a json file by now. I can also find the 'next' page's url:

next_url = r.links['next'].get('url')
r2 = requests.get(next_url, headers=headers)
users2 = r2.json()

Since I don't know how many pages yet, how can I append 2nd, 3rd... page to 'users.json' sequentially in a while loop as fast as possible?

Thanks!

2

There are 2 answers

2
Tuan Anh Hoang-Vu On BEST ANSWER

First, you need to open file in 'a' mode, otherwise subsequence write will overwrite everything

import requests
import json

url = 'https://api.github.com/users'
token = "my_token"
headers = {'Authorization': 'token %s' % token}

outfile = open('users.json', 'a')

while True:
    r = requests.get(url, headers=headers)
    users = r.json()
    json.dump(users, outfile)
    url = r.links['next'].get('url')
    # I don't know what Github return in case there is no more users, so you need to double check by yourself
    if url == '':
        break

outfile.close()
1
John On

Append the data you get from the requests query to a list and move on to the next query.

Once you have all of the data you want, then proceed to try to concatenate the data into a file or into an object. You can also use threading to do multiple queries in parallel, but most likely there is going to be rate limiting on the api.