How to reduce request time in a JSON or replace a dictionary key with a default one?

252 views Asked by At

I have a list of dictionaries and I'm filling it out as I search a JSON url. The problem is that JSON (provided by the Google Books API) is not always complete. This is a search for books and from what I saw, all of them have id, title and authors, but not all of them have imageLinks. Here's a JSON link as an example: Search for Harry Potter.

Note that it always returns 10 results, in this example there are 10 IDs, 10 titles, 10 authors, but only 4 imageLinks.

@app.route('/search', methods=["GET", "POST"])
@login_required
def search():
    if request.method == "POST":
        while True:
            try:
                seek = request.form.get("seek")
                url = f'https://www.googleapis.com/books/v1/volumes?q={seek}'
                response = requests.get(url)
                response.raise_for_status()
                search = response.json()
                seek = search['items']
                infobooks = []
                for i in range(len(seek)):
                    infobooks.append({
                        "book_id": seek[i]['id'],
                        "thumbnail": seek[i]['volumeInfo']['imageLinks']['thumbnail'],
                        "title": seek[i]['volumeInfo']['title'],
                        "authors": seek[i]['volumeInfo']['authors']
                    })
                return render_template("index.html", infobooks=infobooks)
            except (requests.RequestException, KeyError, TypeError, ValueError):
                continue
    else:
        return render_template("index.html")

The method I used and that I'm demonstrating above, I can find 10 imageLinks (thumbnails) but it takes a long time! Anyone have any suggestions for this request not take so long? Or some way I can insert a "Book Without Cover" image when I can't find an imageLink? (not what I would like, but it's better than having to wait for the results)

4

There are 4 answers

1
ARR On BEST ANSWER

Firstly your function will never result in 10 imageLinks as the api will always return the same results. So if you retrieved 4 imageLinks the first time it will the same the second time. Unless google updates the dataset, but that is out of your control.

The Google Books Api allows max to 40 results and has default of max 10 results. To increase that you can add the query parameter maxResults=40 where 40 can be any desired number equal or lower than 40. Here you can then decide to programmatically filter out all results without imageLinks, or to leave them and add a no results image url to them. Also not every result returns a list of authors, that has also been fixed in this example. Take no risks with third party api's always check on empty/null results because it can break your code. I have used .get to avoid any exceptions from occurring when processing json.

Although I have not added it in this example you can also use pagination which google books provides to paginate for even more results.

Example:

@app.route('/search', methods=["GET", "POST"])
@login_required
def search():
    if request.method == "POST":
        seek = request.form.get("seek")
        url = f'https://www.googleapis.com/books/v1/volumes?q={seek}&maxResults=40'
        response = requests.get(url)
        response.raise_for_status()
        results = response.json().get('items', [])
        infobooks = []
        no_image = {'smallThumbnail': 'http://no-image-link/image-small.jpeg', 'thumbnail': 'http://no-image-link/image.jpeg'}
        for result in results:
            info = result.get('volumeInfo', {})
            imageLinks = info.get("imageLinks")
            infobooks.append({
                "book_id": result.get('id'),
                "thumbnail": imageLinks if imageLinks else no_image,
                "title": info.get('title'),
                "authors": info.get('authors')
            })
        return render_template("index.html", infobooks=infobooks)
    else:
        return render_template("index.html")

Google Books Api Docs: https://developers.google.com/books/docs/v1/using

3
bobveringa On

From your question it was not immediately obvious what the problem is (hence the lack of engagement). After playing around with the code and the API for a bit, I now have a much better understanding of the issue.

The issue is that the Google books API does not always include an image thumbnail for each of the items.

Your current solution for this issue is to retry the entire search until all the fields have an image thumbnail. But think if this is really needed. Maybe you can split it up. In my testing I've seen that the books without image thumbnails often switch. Meaning that if you just keep retrying until all the results from the query have a thumbnail, it will take a long time.

The solution should attempt to query each book individually for the thumbnail. After X number of tries it should default to a 'image available', to avoid spamming the API.

As you already figured out in your post, you can get the volume ID of each book from the original search query. You can then use this API call to query each of those volumes individually.

I've created some code to validate that this works. And only one book does not have an image thumbnail at the end. This code still has a lot of room for improvement, but I'll leave that as an exercise for you.

import requests

# Max attempts to get an image
_MAX_ATTEMPTS = 5

# No Image Picture
no_img_link = 'https://upload.wikimedia.org/wikipedia/en/6/60/No_Picture.jpg'


def search_book(seek):
    url = f'https://www.googleapis.com/books/v1/volumes?q={seek}'
    response = requests.get(url)
    search = response.json()
    volumes = search['items']

    # Get ID's of all the volumes
    volume_ids = [volume['id'] for volume in volumes]

    # Storage for the results
    book_info_collection = []

    # Loop over all the volume ids
    for volume_id in volume_ids:

        # Attempt to get the thumbnail a couple times
        for i in range(_MAX_ATTEMPTS):
            url = f'https://www.googleapis.com/books/v1/volumes/{volume_id}'
            response = requests.get(url)
            volume = response.json()
            try:
                thumbnail = volume['volumeInfo']['imageLinks']['thumbnail']
            except KeyError:
                print(f'Failed for {volume_id}')
                if i < _MAX_ATTEMPTS - 1:
                    # We still have attempts left, keep going
                    continue
                # Failed on the last attempt, use a default image
                thumbnail = no_img_link
                print('Using Default')

            # Create dict with book info
            book_info = {
                "book_id": volume_id,
                "thumbnail": thumbnail,
                "title": volume['volumeInfo']['title'],
                "authors": volume['volumeInfo']['authors']
            }

            # Add to collection
            book_info_collection.append(book_info)
            break

    return book_info_collection


books = search_book('Harry Potter')
print(books)

0
bobveringa On

You have added that you want it to load fast. This means that you cannot do retries in python as any retry you do in python would mean longer page loading times.

This means that you have to do the loading in the browser. You can use the same method as for the pure python method. At first, you just use all the images in the request and make additional requests for all the volumes that did not have an image. This means that you have 2 endpoints, one for the volume_information. And another endpoint to just get the data for one volume.

Note that I am using the term volume instead of book, as that is what the Google API also uses.

Now, JavaScript is not my strong suit so the solution I am providing here has a lot of room for improvement.

I've made this example using flask. This example should help you implement your solution that fits your specific application.

Extra Note: In my testing I've noticed that, some regions more often respond with all thumbnails than others. The API sends different responses based on your IP address. If I set my IP to be in the US I often get all the thumbnails without retries. I am using a VPN to do this, but there may be other solutions.

app.py

import time

from flask import Flask, render_template, request, jsonify
import requests

app = Flask(__name__)


@app.route('/')
def landing():
    return render_template('index.html', volumes=get_volumes('Harry Potter'))


@app.route('/get_volume_info')
def get_volume_info_endpoint():
    volume_id = request.args.get('volume_id')
    if volume_id is None:
        # Return an error if no volume id was provided
        return jsonify({'error': 'must provide argument'}), 400

    # To stop spamming the API
    time.sleep(0.250)
    
    # Request volume data
    url = f'https://www.googleapis.com/books/v1/volumes/{volume_id}'
    response = requests.get(url)
    volume = response.json()

    # Get the info using the helper function
    volume_info = get_volume_info(volume, volume_id)
    
    # Return json object with the info
    return jsonify(volume_info), 200


def get_volumes(search):
    # Make request
    url = f'https://www.googleapis.com/books/v1/volumes?q={search}'
    response = requests.get(url)
    data = response.json()

    # Get the volumes
    volumes = data['items']

    # Add list to store results
    volume_info_collection = []

    # Loop over the volumes
    for volume in volumes:
        volume_id = volume['id']
        
        # Get volume info using helper function
        volume_info = get_volume_info(volume, volume_id)

        # Add it to the result
        volume_info_collection.append(volume_info)
    
    return volume_info_collection


def get_volume_info(volume, volume_id):
    # Get basic information
    volume_title = volume['volumeInfo']['title']
    volume_authors = volume['volumeInfo']['authors']
    
    # Set default value for thumbnail
    volume_thumbnail = None
    try:
        volume_thumbnail = volume['volumeInfo']['imageLinks']['thumbnail']
    except KeyError:
        # Failed we keep the None value
        print('Failed to get thumbnail')
    
    # Fill in the dict
    volume_info = {
        'volume_id': volume_id,
        'volume_title': volume_title,
        'volume_authors': volume_authors,
        'volume_thumbnail': volume_thumbnail
    }
    
    # Return volume info
    return volume_info


if __name__ == '__main__':
    app.run()

Template index.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Title</title>
    <script>
        let tracker = {}

        function get_thumbnail(id) {
            let url = '/get_volume_info?volume_id=' + id
            fetch(url).then(function (response) {
                return response.json();
            }).then(function (data) {
                console.log(data);
                return data['volume_thumbnail']
            }).catch(function () {
                console.log("Error");
            });
        }

        function image_load_failed(id) {
            let element = document.getElementById(id)

            if (isNaN(tracker[id])) {
                tracker[id] = 0
            }
            console.log(tracker[id])

            if (tracker[id] >= 5) {
                element.src = 'https://via.placeholder.com/128x196C/O%20https://placeholder.com/'
                return
            }

            element.src = get_thumbnail(id)
            tracker[id]++
        }
    </script>
</head>
<body>

<table>
    <tr>
        <th>ID</th>
        <th>Title</th>
        <th>Authors</th>
        <th>Thumbnail</th>
    </tr>
    {% for volume in volumes %}
        <tr>
            <td>{{ volume['volume_id'] }}</td>
            <td>{{ volume['volume_title'] }}</td>
            <td>{{ volume['volume_authors'] }}</td>
            <td><img id="{{ volume['volume_id'] }}" src="{{ volume['volume_thumbnail'] }}"
                     onerror="image_load_failed('{{ volume['volume_id'] }}')"></td>
        </tr>
    {% endfor %}

</table>

</body>
</html>
0
Henshal B On

Add a dummy image URL

"book_id": seek[i]['id'] or 'dummy_url'