Writing multiple JSON objects as one object to a single file with python

6k views Asked by At

I'm using python to hit a foreman API to gather some facts about all the hosts that foreman knows about. Unfortunately, there is not get-all-hosts-facts (or something similar) in the v1 foreman API, so I'm having to loop through all the hosts and get the information. Doing so has lead me to an annoying problem. Each call to a given host return a JSON object like so:

{
  "host1.com": {
    "apt_update_last_success": "1452187711", 
    "architecture": "amd64", 
    "augeasversion": "1.2.0", 
    "bios_release_date": "06/03/2015", 
    "bios_vendor": "Dell Inc."
   }
}

This is totally fine, the issue arises when I append the next host's information. I then get a json file that looks something like this:

{
  "host1.com": {
    "apt_update_last_success": "1452187711", 
    "architecture": "amd64", 
    "augeasversion": "1.2.0", 
    "bios_release_date": "06/03/2015", 
    "bios_vendor": "Dell Inc."
}
}{
"host2.com": {
    "apt_update_last_success": "1452703454", 
    "architecture": "amd64", 
    "augeasversion": "1.2.0", 
    "bios_release_date": "06/03/2015", 
    "bios_vendor": "Dell Inc."
   }
}

Here's the code that's doing this:

for i in hosts_data:
    log.info("Gathering host facts for host: {}".format(i['host']['name']))
    try:
        facts = requests.get(foreman_host+api+"hosts/{}/facts".format(i['host']['id']), auth=(username, password))
        if hosts.status_code != 200:
            log.error("Unable to connect to Foreman! Got retcode '{}' and error message '{}'"
            .format(hosts.status_code, hosts.text))
            sys.exit(1)
    except requests.exceptions.RequestException as e:
        log.error(e)
    facts_data = json.loads(facts.text)
    log.debug(facts_data)
    with open(results_file, 'a') as f:
        f.write(json.dumps(facts_data, sort_keys=True, indent=4))

Here's what I need the file to look like:

{
"host1.com": {
    "apt_update_last_success": "1452187711",
    "architecture": "amd64",
    "augeasversion": "1.2.0",
    "bios_release_date": "06/03/2015",
    "bios_vendor": "Dell Inc."
},
"host2.com": {
    "apt_update_last_success": "1452703454",
    "architecture": "amd64",
    "augeasversion": "1.2.0",
    "bios_release_date": "06/03/2015",
    "bios_vendor": "Dell Inc."
  }
}
3

There are 3 answers

1
pault On BEST ANSWER

It would be better to assemble all of your data into one dict and then write it all out one time, instead of each time in the loop.

d = {}
for i in hosts_data:
    log.info("Gathering host facts for host: {}".format(i['host']['name']))
    try:
        facts = requests.get(foreman_host+api+"hosts/{}/facts".format(i['host']['id']), auth=(username, password))
        if hosts.status_code != 200:
            log.error("Unable to connect to Foreman! Got retcode '{}' and error message '{}'"
            .format(hosts.status_code, hosts.text))
            sys.exit(1)
    except requests.exceptions.RequestException as e:
        log.error(e)
    facts_data = json.loads(facts.text)
    log.debug(facts_data)
    d.update(facts_data)  #add to dict
# write everything at the end
with open(results_file, 'a') as f:
    f.write(json.dumps(d, sort_keys=True, indent=4))
0
Torkel Bjørnson-Langen On

Instead of writing json inside the loop, insert the data into a dict with the correct structure. Then write that dict to json when the loop is finished.

This assumes your dataset fit into memory.

1
ShadowRanger On

For safety/consistency, you need to load in the old data, mutate it, then write it back out.

Change the current with and write to:

# If file guaranteed to exist, can use r+ and avoid initial seek
with open(results_file, 'a+') as f:
    f.seek(0)
    combined_facts = json.load(f)
    combined_facts.update(facts_data)
    f.seek(0)
    json.dump(combined_facts, f, sort_keys=True, indent=4)
    f.truncate()  # In case new JSON encoding smaller, e.g. due to replaced key

Note: If possible, you want to use pault's answer to minimize unnecessary I/O, this is just how you'd do it if the data retrieval should be done piecemeal, with immediate updates for each item as it becomes available.

FYI, the unsafe way is to basically find the trailing curly brace, delete it, then write out a comma followed by the new JSON (removing the leading curly brace from it's JSON representation). It's much less I/O intensive, but it's also less safe, doesn't clean out duplicates, doesn't sort the hosts, doesn't validate the input file at all, etc. So don't do it.