I am using gcloud-python library for a project which needs to serve following use case:
- Get a batch of entities with a subset of its properties (projection)
gcloud.datastore.api.get_multi()
provides me batch get but not projection- and
gcloud.datastore.api.Query()
provides me projection but not batch get (like aIN
query)
AFAIK, GQLQuery provides both IN query(batch get) and projections. Is there a plan to support GQLQueries in gcloud-python library? OR, is there another way to get batching and projection in single request?
Currently there is no way to request a subset of an entities properties. When you have the list of keys that you need, you should use
get_multi()
.Projection Query Background
In Datastore, projection queries are simply index scans.
For example, consider you are writing the query
SELECT * FROM MyKind ORDER BY myFirstProp, mySecondProp
. This query will execute against an index:Index(MyKind, myFirstProp, mySecondProp)
. This index may look something like:For each result in the index, Datastore then looks up the key associated with that index result. If you do a projection query where you project only
myFirstProp
ormySecondProp
or both, Datastore can avoid doing the random access lookup to find the associated entity for each result. This is generally where you get the large performance gain from using projections -- not from the savings of transporting it over the network.Likewise, if you know the list of keys that you need, you can lookup the key directly -- there is no need to look in an index first.
IN Operator
In Python GQL (not in the similar Cloud Datastore GQL), there is the
IN
operator, which allows you to write a query that looks something like:However, Datastore does not actually support this query natively. Inside the python client, this will get converted into disjunctive normal form:
This means for each value inside your
IN
, you'll be issuing a separate Datastore query.