How to do DB memcaching in Django with derived data?

281 views Asked by At

NOTE: This is a detailed question asking how best to implement and manage Database caching in my web-application with memcached. This question uses Python/Django to illustrate the data-models and usage, but the language is not really that relevant. I'm really more interested in learning what the best strategy to maintain cache-coherency is. Python/Django just happens to be the language I'm using to illustrate this question.

RULES OF MY APPLICATION:

  1. I have a 3 x 3 grid of cells of integers
  2. The size of this grid may increase or decrease in the future. Our solution must scale.
  3. Their is a cumulative score for each row that is calculated by summing (value * Y-Coord) for each cell in that row.
  4. Their is a cumulative score for each column that is calculated by summing (value * X-Coord) for each cell in that column.
  5. The values in the cells change infrequently. But those values and the scores scores are read frequently.
  6. I want to use memcached to minimize my database accesses.
  7. I want to minimize/eliminate storing duplicate or derived information in my database

The image below shows an example of the state of the my grid.

enter image description here

MY CODE:

import memcache
mc = memcache.Client(['127.0.0.1:11211'], debug=0)

class Cell(models.Model):
    x = models.IntegerField(editable=False)
    y = models.IntegerField(editable=False)

    # Whenever this value is updated, the keys for the row and column need to be 
    # invalidated. But not sure exactly how I should manage that.
    value = models.IntegerField()


class Row(models.Model):
    y = models.IntegerField()

    @property
    def cummulative_score(self):
        # I need to do some memcaching here.
        # But not sure the smartest way to do it.
        return sum(map(lambda p: p.x * p.value, Cell.objects.filter(y=self.y)))

class Column(models.Model):
    x = models.IntegerField()

    @property
    def cummulative_score(self):
        # I need to do some memcaching here.
        # But not sure the smartest way to do it.
        return sum(map(lambda p: p.y * p.value, Cell.objects.filter(x=self.x)))

SO HERE IS MY QUESTION:

You can see that I have setup a memcached instance. Of course I know how to insert/delete/update keys and values in memcached. But given my code above how should I name the keys appropriately? It won't work if the key names are fixed since there must exist individual keys for each row and column. And critically how can I ensure that the appropriate keys (and only the appropriate keys) are invalidated when the values in the cells are updated?

How do I manage the cache invalidations whenever anyone updates Cell.values so that the database accesses are minimized? Isn't there some django middleware that can handle this book-keeping for me? The documents that I have seen don't do that.

3

There are 3 answers

5
Árni St. Steinunnarson On BEST ANSWER
# your client, be it memcache or redis, assign to client variable
# I think both of them use set without TTL for permanent values.

class Cell(models.Model):
    x = models.IntegerField(editable=False)
    y = models.IntegerField(editable=False)
    value = models.IntegerField()

    def save(self, *args, **kwargs):
        Cell.cache("row",self.y)
        Cell.cache("column",self.x)
        super(Cell, self).save(*args, **kwargs)

    @staticmethod
    def score(dimension, number):
        return client.get(dimension+str(number), False) or Cell.cache(number)

    @staticmethod
    def cache(dimension, number):
        if dimension == "row":
            val = sum([c.y * c.value for c in Cell.objects.filter(y=number)])
            client.set(dimension+str(self.y),val)
            return val

        if dimension == "column":
            val = sum([c.x * c.value for c in Cell.objects.filter(x=number)])
            client.set(dimension+str(self.x),val)
            return val

        raise Exception("No such dimension:"+str(dimension))
2
tovmeod On

If you want to cache individual row/column combinations you should append the object id to the key name.

given a x, y variables:

key = 'x={}_y={}'.format(x, y)

I would use the table name and just append the id, row id could just be the table PK, column id could just be the column name, like this

key = '{}_{}_{}'.format(table_name, obj.id, column_name)

In any case I suggest considering caching the whole row instead of individuals cells

0
tcarobruce On

The Cell object can invalidate cached values for its Row and Column when the model object is saved.

(Row and Column are plain objects here, not Django models, but of course you can change that if you need to store them in the database for some reason.)

import memcache
mc = memcache.Client(['127.0.0.1:11211'], debug=0)

class Cell(models.Model):
    x = models.IntegerField(editable=False)
    y = models.IntegerField(editable=False)

    # Whenever this value is updated, the keys for the row and column need to be 
    # invalidated. But not sure exactly how I should manage that.
    value = models.IntegerField()

    def invalidate_cache(self):
        Row(self.y).invalidate_cache()
        Column(self.x).invalidate_cache()

    def save(self, *args, **kwargs):
        super(Cell, self).save(*args, **kwargs)
        self.invalidate_cache()


class Row(object):
    def __init__(self, y):
        self.y = y

    @property
    def cache_key(self):
        return "row_{}".format(self.y)

    @property
    def cumulative_score(self):
        score = mc.get(self.cache_key)
        if not score:
            score = sum(map(lambda p: p.x * p.value, Cell.objects.filter(y=self.y)))
            mc.set(self.cache_key, score)
        return score

    def invalidate_cache(self):
        mc.delete(self.cache_key)


class Column(object):
    def __init__(self, x):
        self.x = x

    @property
    def cache_key(self):
        return "column_{}".format(self.x)

    @property
    def cumulative_score(self):
        score = mc.get(self.cache_key)
        if not score:
            score = sum(map(lambda p: p.y * p.value, Cell.objects.filter(x=self.x)))
            mc.set(self.cache_key, score)
        return score

    def invalidate_cache(self):
        mc.delete(self.cache_key)