How to extract data from json file with multiple arrays?

1.5k views Asked by At

I'm writing a small flask app, that accesses a json-API. All the urls from the json file should then be extracted and put into a list for further use. The code in question:

@app.route('/booro/<tags>') # <tags> acts as search
def process(tags):
    r=requests.get('https://example.com/post/index.json?'+tags+'&limit=1') #gives me a json file with only one array
    z = r.text[1:-1] # to remove "[" and "]" so that it can be loaded
    i = json.loads(z)
    m = i['file_url']
    return m

The code above then presents me just a link on the generated page as expected.

But if I set &limit to 2 or higher, I get this error message.

Traceback (most recent call last):
  File "/home/user/.local/lib/python3.5/site-packages/flask/app.py", line 1997, in __call__
    return self.wsgi_app(environ, start_response)
  File "/home/user/.local/lib/python3.5/site-packages/flask/app.py", line 1985, in wsgi_app
    response = self.handle_exception(e)
  File "/home/user/.local/lib/python3.5/site-packages/flask/app.py", line 1540, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/home/user/.local/lib/python3.5/site-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/home/user/.local/lib/python3.5/site-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/user/.local/lib/python3.5/site-packages/flask/app.py", line 1614, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/user/.local/lib/python3.5/site-packages/flask/app.py", line 1517, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/user/.local/lib/python3.5/site-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/home/user/.local/lib/python3.5/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/user/.local/lib/python3.5/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/user/Schreibtisch/FlaskProxy/maindrive.py", line 19, in process
    i = json.loads(z)
  File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.5/json/decoder.py", line 342, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 1278 (char 1277)

The problem is that the downloaded json-file then contains multiple arrays that can't be parsed. It looks something like this:

{"id":1383185,"tags":"new main default","file_url": "example1.com"} {"id":1383185, "tags":"vivid original alternative","file_url": "example2.com"}

Is there a way to extract values with the same key across multiple arrays?

1

There are 1 answers

4
anmaxvl On BEST ANSWER

You shouldn't be removing [ and ], since the returned object is a JSON array and setting the limit to 1 will return you an array with just one object in it. You need to parse the response content without modifications, i.e.

i = json.loads(r.text)

In case of limit==1 you will have:

[{'file_url': 'example1.com', ...}]

In case of limit==2 you will have:

[{'file_url': 'example1.com', ...}, {'file_url': 'example2.com', ...}]

etc. By removing [ and ], json.loads tries to treat the response as a single JSON object, instead of a JSON array of objects, i.e. trying to parse something like:

{"file_url": "example1.com", ...}, {"file_url": "example2.com", ...}

which is not a valid JSON

That being said, you would have to do something like this:

r = requests.get(...)
response_content = json.loads(r.text)
for obj in response_content:
    file_url = obj['file_url']
    # Do something with your file_url here. Ex. append to list m.append(file_url)