I am communicating with Google API via batch requests through its google-api-python-client
. In the batch requests there are limitations:
- A batch request can not contain more than 1000 requests,
- A batch request can not contain more than 1MB in the payload.
I have random number of random length strings in a list, from which I need to construct a batch request while keeping the aforementioned limitations in mind.
Does anyone know a good way to efficiently build chunks of that original list that can be submitted to Google API? By 'efficiently' I mean, not iterating through all elements from part one (counting the payload size).
So far, that's what I had in mind: take at maximum 1000 piece of the items, build the request, see the payload size. If it's bigger than 1M, take 500, see the size. If the payload is bigger, take the first 250 items. If the payload if smaller, take 750 items. And so on, you get the logic. This way, one could get the right amount of elements with less iterations than building the payload while checking it after each addition.
I really don't want to reinvent the wheel, so if anyone knows an efficient builtin/module for that, please let me know.
The body payload size can be calculated by calling _serialize_request, when you've added the right amount of requests to the instantiated BatchHttpRequest.
See also the Python API Client Library documentation on making batch requests.
Okay, it seems I created something that solves this issue, here's a draft of the idea in python:
This code does exactly the same I tried to describe, by finding and modifying the bounds of the list while measuring its returned (concatenated) size, and then giving back the index of the list where it should be sliced in order to achieve the most efficient string size. This method avoids the CPU overhead of compiling and measuring the list one by one. Running this code will show you the iterations it does on the list.
The
get_str_length
, lists and other functions can be replaced to use the corresponding functionality in the API client, but this is the rough idea behind.However the code is not foolproof, the solution should be something along these lines.