synching mysql and memcached/membase

484 views Asked by At

I have an application where I would like to roll up certain information into membase to avoid expensive group by queries. For example, a click conversion will be recorded in MySQL, and I want to keep a running total of clicks grouped by hours for a certain user in a memcache key. There I can un/serialize an array with the values I need. I have many other needs like this with revenue, likes, etc.

What would be the best way to create some sort of "transaction" that assures MC and Mysql remain in sync? I could always rebuild the key store based on the underlying MySQL store, but I would like to maintain good concurrency between the two products.

1

There are 1 answers

0
Mason Bryant On

At a high level, to use membase / memcache / etc as a cache for mysql, you'll want to do something like the following:

public Object readMethod(String key) {
    value = membaseDriver->get(key);
    if(value != null) {
        return value;
    }
    value = getFromMysql(key);
    membaseDriver->put(key, value, TTL);
}

public Object writeMethod(String key, String value) {
    writeToMysql(key, value);
    membaseDriver->delete(key); 
    //next call to get will get the value that we just wrote to db
}

This ensures that your DB remains the primary source of the data and ensures that membase and mysql stay nearly in sync. (it is not in sync while a process is executing the write method, after it has written to mysql and before it has deleted the key from membase).

If you want them to be really in sync, you have to ensure that while any process is executing the writeMethod, no process can execute the readMethod. You can do a simple global lock in memcache / membase by using the add method. Basically, you add a unique key named after your lock (eg: "MY_LOCK") if the add succeeds, you have the lock, after this happens nobody else can get the lock. When you are done with your write, you release the lock by calling delete with your lock's keyname. By starting both of those methods with that "lock", and ending both of those methods with the "unlock" you ensure that only one process at a time is executing either one. You could also build separate read and write locks on top of that, but I don't think locking is really want you want to do unless you need to be 100% up to date (as opposed to 99.999% up to date).

In the clicks per hour case, you could avoid having to re-run the query every time you count another click by keeping the current hour (ie: the only one that will change) separate from the array of all previous hours (which will probably never change, right?).

Every time you add a click, just use memcache increment on the current hour's counter. Then when you get a read request, look up the the array of all previous hours, then the current hour, and all previous hours with the current hour appended to the end. As a free bonus, the fact that increment is atomic provides you with actually synchronized values so you can skip locking.