Do Cursors deal with Eventual Consistency?

162 views Asked by At

In the App Engine Documentation I found an interesting strategy for keeping up to date with changes in the datastore by using Cursors:

An interesting application of cursors is to monitor entities for unseen changes. If the app sets a timestamp property with the current date and time every time an entity changes, the app can use a query sorted by the timestamp property, ascending, with a Datastore cursor to check when entities are moved to the end of the result list. If an entity's timestamp is updated, the query with the cursor returns the updated entity. If no entities were updated since the last time the query was performed, no results are returned, and the cursor does not move.

However, I'm not quite sure how this can always work. After all, when using the High Replication Datastore, queries are only eventually consistent. So if two entities are put, and only the later of the two is seen by the query, it will move the cursor past both of them. Which will mean that the first of the two new entities will remain unseen.

So is this an actual issue? Or is there some other way that cursors work around this?

2

There are 2 answers

1
Ryan On

Having an index, builtin or composite, on a property that contains a monotonically increasing value (such as the current timestamp) may not perform as well as you may want at high write rates. This type of workload will generate a hotspot, as the tail of the index is constantly being updated as opposed to the load being distributed throughout the sorted index. However, for low write-rates, this will work fine.

The rest of the answer will depend on whether you are in the same entity group or separate entity groups.

If your query is an ancestor query, and thus in the same entity group it can be strongly consistent (by default they are), and the described method should always be accurate. The query will immediately see any writes (changes to an entity inside the entity group).

If you are querying over many entities groups, which is always eventually consistent, then there is no guarantee what order the writes are applied/visible. For example: - Time1 - Write EntityA - Time2 - Write EntityB - Time3 - Query only sees EntityB - Time4 - Query sees EntityA and EntityB

So the method of using a cursor to detect a change is correct, but it may "skip" over some changes.

For more information on eventual/strong consistency, see Balancing Strong and Eventual consistency with Google Cloud Datastore

0
dragonx On

You'll probably be best informed if you could ask someone who's worked on it, but after thinking about it a bit and re-reading Paxos a bit, I think it should not be a problem, though it would depend on how the datastore's actually implemented.

A cursor is essentially a position in the index. In theory you can re-read the same cursor over and over, and see new entities start appearing after it. In the real world case, you'll generally move on to the newest cursor position and forget about the old cursor position.

Eventual consistency "problems" appear because there's multiple copies of the index spread across multiple machines. Depending on which index you read from, you may get stale results.

You describe a problem case where there are two (exact) copies of an index I, and two new entities are created, E1, and E2. Say I1 = I + E1 and I2 = I + E2, so depending on the index you read from, you might get E1 or E2 as the new entity, move your cursor, and miss an entity when the index gets "patched" with the other index, ie I2 eventually gets patched to I + E1 + E2.

If the datastore actually happens that way, then I suspect, yes, you can get a problem. However, it sounds very difficult to operate that way, and I suspect the datastore indexes only get updated after the Paxos voting comes to an agreement. So you'll never seen an out-of-order index, you'll only see entities show up late: ie, you'll never see I + E2, you'll only ever see (I) or (I + E1) or (I + E1 + E2)

I suspect though, you might get a problem where you may be able to have a cursor that's too new for an index that hasn't caught up yet.