App Engine Memcache Key Prefix Across Versions

746 views Asked by At

Greetings!

I've got a Google App Engine Setup where memcached keys are prefixed with os.environ['CURRENT_VERSION_ID'] in order to produce a new cache on deploy, without having to flush the cache manually.

This was working just fine until it became necessary for development to run two versions of the application at the same time. This, of course, is yielding inconsistencies in caching.

I'm looking for suggestions as to how to prefix the keys now. Essentially, there needs to be a variable that changes across versions when any version is deployed. (Well, this isn't quite ideal, as the cache gets totally blown out.)

I was thinking of the following possibilities:

  • Make a RuntimeEnvironment entity that stores the latest cache prefix. Drawbacks: even if cached, slows down every request. Cannot be cached in memory, only in memcached, as deployment of other version may change it.

  • Use a per-entity version number. This yields very nice granularity in that the cache can stay warm for non-modified entities. The downside is we'd need to push to all versions when models are changed, which I want to avoid in order to test model changes out before deploying to production.

  • Forget key prefix. Global namespace for keys. Write a script to flush the cache on every deploy. This actually seems just as good as, if not better than, the first idea: the cache is totally blown in both scenarios, and this one avoids the overhead of the runtime entity.

Any thoughts, different ideas greatly appreciated!

2

There are 2 answers

2
jbenet On BEST ANSWER

There was a bit of confusion with how i worded the question.

I ended up going for a per-class hash of attributes. Take this class for example:

class CachedModel(db.Model):

  @classmethod
  def cacheVersion(cls):
    if not hasattr(cls, '__cacheVersion'):
      props = cls.properties()
      prop_keys = sorted(props.keys())
      fn = lambda p: '%s:%s' % (p, str(props[p].model_class))
      string = ','.join(map(fn, prop_keys))
      cls.__cacheVersion = hashlib.md5(string).hexdigest()[0:10]
    return cls.__cacheVersion

  @classmethod
  def cacheKey(cls, key):
    return '%s-%s' % (cls.cacheVersion(), str(key))

That way, when entities are saved to memcached using their cacheKey(...), they will share the cache only if the actual class is the same.

This also has the added benefit that pushing an update that does not modify a model, leaves all cache entries for that model intact. In other words, pushing an update no longer acts as flushing the cache.

This has the disadvantage of hashing the class once per instance of the webapp.

UPDATE on 2011-3-9: I changed to a more involved but more accurate way of getting the version. Turns out using __dict__ yielded incorrect results as its str representation includes pointer addresses. This new approach just considers the datastore properties.

UPDATE on 2011-3-14: So python's hash(...) is apparently not guaranteed to be equal between runs of the interpreter. Was getting weird cases where a different app engine instance was seeing different hashes. using md5 (which is faster than sha1 faster than sha256) for now. no real need for it to be crypto-secure. just need an ok hashfn. Will probably switch to use something faster, but for now i rather be bugfree. Also ensured keys were getting sorted, not the property objects.

1
Danny Tuppeny On

The os.environ['CURRENT_VERSION_ID'] value will be different to your two versions, so you will have separate caches for each one (the live one, and the dev/testing one).

So, I assume your problem is that when you "deploy" a version, you do not want the cache from development/testing to be used? (otherwise, like Nick and systempuntoout, I'm confused).

One way of achieving this would be to use the domain/host header in the cache - since this is different for your dev/live versions. You can extract the host by doing something like this:

scheme, netloc, path, query, fragment = urlparse.urlsplit(self.request.url)

# Discard any port number from the hostname
domain = netloc.split(':', 1)[0]

This won't give particularly nice keys, but it'll probably do what you want (assuming I understood correctly).