Get intersection of 4 JSON files based on 1-2 common key values? (Python)

290 views Asked by At

Below are 4 JSON files:

  • 3 JSON files have 3 key fields: name, rating, and year
  • 1 JSON has only 2 key fields: name, rating (no year)
[
  {
    "name": "Apple",
    "year": "2014",
    "rating": "21"
  },
  {
    "name": "Pear",
    "year": "2003",
    "rating": ""
  },
  {
    "name": "Pineapple",
    "year": "1967",
    "rating": "60"
  },
]
[
  {
    "name": "Pineapple",
    "year": "1967",
    "rating": "5.7"
  },
  {
    "name": "Apple",
    "year": "1915",
    "rating": "2.3"
  },
  {
    "name": "Apple",
    "year": "2014",
    "rating": "3.7"
  }
]
[
  {
    "name": "Apple",
    "year": "2014",
    "rating": "2.55"
  }
]
[
  {
    "name": "APPLE",
    "rating": "+4"
  },
  {
    "name": "LEMON",
    "rating": "+3"
  }
]

When you search for 'Apple' across all 4 files, you want to return 1 name, 1 year, and 4 ratings:

name: Apple (closest match to search term across all 4 files)
year: 2014 (the MOST COMMON year for Apple across first 3 JSONs)
rating:  21 (from JSON1)
        3.7 (from JSON2)
       2.55 (from JSON3)
         +4 (from JSON4)

Now pretend JSON3 (or any JSON) has no match for 'name: Apple'. In that case, instead return the following. Assume there will be at least one match in at least one file.

name: Apple (closest match to search term across all 4 files)
year: 2014 (the MOST COMMON year for Apple across first 3 JSONs)
rating:  21 (from JSON1)
        3.7 (from JSON2)
  Not Found (from JSON3)
         +4 (from JSON4)

How would you get this output in Python?

This question is similar to the example code in Python - Getting the intersection of two Json-Files , except there are 4 files, 1 file is missing the year key, and we don't need the intersection of the rating key's value.

Here's what I have so far, just for two sets of JSON above:

import json

with open('1.json', 'r') as f:
  json1 = json.load(f)

with open('2.json', 'r') as f:
  json2 = json.load(f)

json2[0]['name'] = list(set(json2[0]['name']) - set(json1[0]['name']))

print(json.dumps(json2, indent=2))

I get output from this, but it doesn't match what I'm trying to achieve. For example, this is part of the output:

  {
    "name": [
      "a",
      "n",
      "i",
      "P"
    ],
    "year": "1967",
    "rating": "5.7"
  },
1

There are 1 answers

0
EliKor On

When you are creating a set with the set constructor, it expects an iterable object and will iterate through the values of this object to make your set. So when you try to make a set directly from a string you end up with

name = set('Apple')
# name = {'A', 'p', 'p', 'l', 'e'}

since the string is an iterable object made up of characters. Instead, you would want to wrap the string into a list or tuple like so

name = set(['Apple'])
# name = {'Apple'}

which in your case would look like

json2[0]['name'] = list(set([json2[0]['name']]) - set([json1[0]['name']]))

but I still don't think that this is really what you are trying to achieve. Instead I would suggest that you iterate through the each of your json files making your own dictionary that is indexed on the names from the json files. Each value in the dictionary would have another dictionary with two keys, rating and year, both of which have a list of values. Once you're done building up your dictionary you would end up with a rating and year list for each name, and then you could convert each year list to a single value by choosing the most frequent year in the year list. Here's an example of how your dictionary might look

{
  "Apple": { "rating": [21, 3.7, ...], "year": [1915, 2014, 2014] }
  "Pineapple": ...
  ...
}