When should cache invalidation happen?

301 views Asked by At

There's a database record. This record has a web page. This web page is cached. At some point the record may get updated and the cache (of the web page) - no longer needed.

Should the system immediately tell the cache that it is no longer needed or should this rather happen when (and only if) the cached web page is requested, preventing a potentially unnecessary deletion?

1

There are 1 answers

0
Vlad Preda On

This depends on your specific requirements.

The way I see it, you have 3 options:

  1. on change - when your entry gets edited, also delete the existing cache information (also make sure it gets re-created on request)
  2. periodically - have a cron job that runs once X time, and re-do the whole cache
  3. percent based (not sure how to call it) - when an entry is requested, do something like this:

(basically below code means once in 1000 requests, the cache for the requested page is cleared)

if (rand(1, 1000) == 666) {
    /** clear the cache for current requested page */
}
/** handle request */

Depending on your traffic and amount of information you cache (probably other factors as well), any can be useful.

#3 works great when you have a huge cache, while #2 is great with smaller caches that get updated often.

#1 would be ideal, but has a very big flaw - sometimes you may not be able to track certain changes. For example, you can't really tell when a template file is changed to re-cache it.

It's up to you to determine your exact needs, the amount of traffic you are getting/expecting, the amount of cache you will have, and there are quite a few tools to do these benchmarks (for example Apache Benchmark).

PS: You will most likely need a combination of these

Example:

On an application with a huge cache that changes often, I would to #1 + #3, while selecting the perfect percent based on the traffic the application receives and benchmark results.

And, to end the answer on a positive note, here is a very nice quote from Leon Bambrick

There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors.