I'd like to use an HTTP proxy (such as nginx) to cache large/expensive requests. These resources are identical for any authorized user, but their authentication/authorization needs to be checked by the backend on each request.
It sounds like something like Cache-Control: public, max-age=0
along with the nginx directive proxy_cache_revalidate on;
is the way to do this. The proxy can cache the request, but every subsequent request needs to do a conditional GET to the backend to ensure it's authorized before returning the cached resource. The backend then sends a 403 if the user is unauthorized, a 304 if the user is authorized and the cached resource isn't stale, or a 200 with the new resource if it has expired.
In nginx if max-age=0
is set the request isn't cached at all. If max-age=1
is set then if I wait 1 second after the initial request then nginx does perform the conditional GET request, however before 1 second it serves it directly from cache, which is obviously very bad for a resource that needs to be authenticated.
Is there a way to get nginx to cache the request but immediately require revalidating?
Note this does work correctly in Apache. Here are examples for both nginx and Apache, the first 2 with max-age=5, the last 2 with max-age=0:
# Apache with `Cache-Control: public, max-age=5`
$ while true; do curl -v http://localhost:4001/ >/dev/null 2>&1 | grep X-Cache; sleep 1; done
< X-Cache: MISS from 172.x.x.x
< X-Cache: HIT from 172.x.x.x
< X-Cache: HIT from 172.x.x.x
< X-Cache: HIT from 172.x.x.x
< X-Cache: HIT from 172.x.x.x
< X-Cache: REVALIDATE from 172.x.x.x
< X-Cache: HIT from 172.x.x.x
# nginx with `Cache-Control: public, max-age=5`
$ while true; do curl -v http://localhost:4000/ >/dev/null 2>&1 | grep X-Cache; sleep 1; done
< X-Cached: MISS
< X-Cached: HIT
< X-Cached: HIT
< X-Cached: HIT
< X-Cached: HIT
< X-Cached: HIT
< X-Cached: REVALIDATED
< X-Cached: HIT
< X-Cached: HIT
# Apache with `Cache-Control: public, max-age=0`
# THIS IS WHAT I WANT
$ while true; do curl -v http://localhost:4001/ >/dev/null 2>&1 | grep X-Cache; sleep 1; done
< X-Cache: MISS from 172.x.x.x
< X-Cache: REVALIDATE from 172.x.x.x
< X-Cache: REVALIDATE from 172.x.x.x
< X-Cache: REVALIDATE from 172.x.x.x
< X-Cache: REVALIDATE from 172.x.x.x
< X-Cache: REVALIDATE from 172.x.x.x
# nginx with `Cache-Control: public, max-age=0`
$ while true; do curl -v http://localhost:4000/ >/dev/null 2>&1 | grep X-Cache; sleep 1; done
< X-Cached: MISS
< X-Cached: MISS
< X-Cached: MISS
< X-Cached: MISS
< X-Cached: MISS
< X-Cached: MISS
As you can see in the first 2 examples the requests are able to be cached by both Apache and nginx, and Apache correctly caches even max-age=0 requests, but nginx does not.
If you are unable to modify the backend app as suggested or if the authentication is straightforward such as auth basic, an alternative approach would be to carry out the authentication in Nginx.
Implementing this auth process and defining the cache validity period would be all you would have to do and Nginx will take care of the rest as per the process flow below
Nginx Process Flow as Pseudo Code:
Con is that depending on the auth type you have, you might need something like the Nginx Lua module to handle the logic.
EDIT
Seen the additional discussions and information given. Now, not fully knowing about how the backend app works but looking at the example config the user
anki-code
gave on GitHub which you commented on HERE, the config below will avoid the issue you raised of backend app's authentication/authorization checks not being run for previously cached resources.I assume the backend app returns a HTTP 403 code for unauthenticated users. I also assume that you have the Nginx Lua module in place since the GitHub config relies on this although I do note that the part you tested does not need that module.
Config:
With this, I'll expect that the test with
$ curl 'http://localhost:3001/api/card/1/query'
will run as follows:First Run (With Required Cookie)
location ~ /api/card((?!/42/|/41/)/[0-9]*/)query
/api/card/42/query
. This location is excluded from caching in the config given.@metabase
named location block which handles the actual request and returns the content to the user.Second Run (Without Required Cookie)
location ~ /api/card((?!/42/|/41/)/[0-9]*/)query
/api/card/42/query
.Instead of
/api/card/42/query
, if resource intensive, you may be able to create a simple card query that will simply be used to do the auth.Seems a straightforward way to go about it. The backend stays as it is without messing about with it and you configure your caching details in Nginx.