How to properly configure grace on Varnish 3.0

312 views Asked by At

I'm trying to configure Varnish to keep into cache for 1 minute a 3rd party page (proxied from a service not under my control). Unluckily this service sometimes can be down and this transfers troubles to my site; to prevent this I want also to serve from the local cache when the service is not available.

I followed tips from grace and saint mode.

Follow some snippets from my VCL file.

Backend configuration:

backend external {
    .host = "external.acme.com";
    .port = "80";
    .probe = {
       .url = "/page";
       .interval = 5s;
       .timeout = 5s;
       .window = 1;
       .threshold = 1;
    }
}

vcl_recv:

sub vcl_recv {
    set req.backend = my_backend;

    ...

    # redirect onto 3rd part service
    if (req.url ~ "www.acme.com:80/page$") {
       set req.url = regsub(req.url, "(.*)", "http://external.acme.com/page");
       set req.backend = external;
    }

    if (!req.backend.healthy) {
        set req.grace = 2h;
    } else {
        set req.grace = 10s;
    }
    ...

vcl_fetch:

sub vcl_fetch {

    if (beresp.status == 503) {
        set beresp.saintmode = 5s;
        #set beresp.ttl = 1h;
        return (restart);
    }
    set beresp.grace = 2h;
    set beresp.ttl = 2h;

    if (req.http.host == "external.acme.com") {
        set beresp.http.X-Varnish-Action = "Caching external.acme.com";
        set beresp.ttl = 1m;
        return(deliver);
    }

I'm testing the down of the remote service by using iptables and the configuration is working:

  • Proxy of the remote page is OK
  • Health polling of the 3rp party page is OK

My only issue is that: when I shutdown the remote service, the 3rd party page is served by the cache for not more than one minute (the TTL configured in vcl_fetch) which is a good TTL for when the remote page is available... but I want to continue to serve this for much longer (2 hours).

This seems in contrast with what I read there: http://book.varnish-software.com/3.0/Saving_a_request.html#req-grace-and-beresp-grace

My suspect is that I'm not entering the grace mode, but the only reason I found on documentation is on not configured health check, but If I look into varnishlog I see health check running properly ("Still healthy", "Went sick", ...)

What I'm missing? There's a way to get what I need?

varnishlog traces

This log when all is going well (the remote server is up and running):

    0 Backend_health - external Still healthy 4--X-RH 1 1 1 0.010659 0.010557 HTTP/1.1 200 OK
   12 BackendClose - external
   12 BackendOpen  b external ...
   12 TxRequest    b GET
   12 TxURL        b ...
   12 TxProtocol   b HTTP/1.1
   12 TxHeader     b Cache-Control: max-age=0
   12 TxHeader     b Upgrade-Insecure-Requests: 1
   12 TxHeader     b User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36
   12 TxHeader     b Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
   12 TxHeader     b Accept-Encoding: gzip, deflate, sdch, br
   12 TxHeader     b Accept-Language: it-IT,it;q=0.8,en-US;q=0.6,en;q=0.4,de;q=0.2
   12 TxHeader     b Cookie:...
   12 TxHeader     b If-None-Match: "6338e-4c0f-54429328f4d3d"
   12 TxHeader     b If-Modified-Since: Wed, 21 Dec 2016 11:10:16 GMT
   12 TxHeader     b Via: 1.1 ...
   12 TxHeader     b X-Forwarded-For: ...
   12 TxHeader     b X-Forwarded-Host: ...
   12 TxHeader     b X-Forwarded-Server: ...
   12 TxHeader     b host: ...
   12 TxHeader     b X-Varnish: 1305584183
   12 RxProtocol   b HTTP/1.1
   12 RxStatus     b 304
   12 RxResponse   b Not Modified
   12 RxHeader     b Date: Thu, 22 Dec 2016 09:25:21 GMT
   12 RxHeader     b Server: Apache/2.2.22 (Debian)
   12 RxHeader     b ETag: "6338e-4c0f-54429328f4d3d"
   12 RxHeader     b Vary: Accept-Encoding
   12 Fetch_Body   b 0(none) cls 0 mklen 0
   12 Length       b 0
   12 BackendReuse b external
    3 SessionOpen  c ::1 60278 :6081
    3 ReqStart     c ::1 60278 1305584183
    3 RxRequest    c GET
    3 RxURL        c /VirtualHostBase/...
    3 RxProtocol   c HTTP/1.1
    3 RxHeader     c Host: localhost:6081
    3 RxHeader     c Cache-Control: max-age=0
    3 RxHeader     c Upgrade-Insecure-Requests: 1
    3 RxHeader     c User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36
    3 RxHeader     c Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
    3 RxHeader     c Accept-Encoding: gzip, deflate, sdch, br
    3 RxHeader     c Accept-Language: it-IT,it;q=0.8,en-US;q=0.6,en;q=0.4,de;q=0.2
    3 RxHeader     c Cookie:...
    3 RxHeader     c If-None-Match: "6338e-4c0f-54429328f4d3d"
    3 RxHeader     c If-Modified-Since: Wed, 21 Dec 2016 11:10:16 GMT
    3 RxHeader     c Via: 1.1 ...
    3 RxHeader     c X-Forwarded-For: ...
    3 RxHeader     c X-Forwarded-Host: ...
    3 RxHeader     c X-Forwarded-Server: ...
    3 RxHeader     c Connection: Keep-Alive
    3 VCL_call     c recv pass
    3 VCL_call     c hash
    3 Hash         c http://...
    3 Hash         c ...
    3 VCL_return   c hash
    3 VCL_call     c pass pass
    3 Backend      c 12 external external
    3 TTL          c 1305584183 RFC -1 -1 -1 1482398514 0 1482398721 0 0
    3 VCL_call     c fetch
    3 TTL          c 1305584183 VCL -1 7200 -1 1482398514 -0
    3 TTL          c 1305584183 VCL 7200 7200 -1 1482398514 -0
    3 TTL          c 1305584183 VCL 60 7200 -1 1482398514 -0
    3 VCL_return   c deliver
    3 ObjProtocol  c HTTP/1.1
    3 ObjResponse  c Not Modified
    3 ObjHeader    c Date: Thu, 22 Dec 2016 09:25:21 GMT
    3 ObjHeader    c Server: Apache/2.2.22 (Debian)
    3 ObjHeader    c ETag: "6338e-4c0f-54429328f4d3d"
    3 ObjHeader    c Vary: Accept-Encoding
    3 ObjHeader    c X-Varnish-Action: Caching external.acme.com
    3 VCL_call     c deliver deliver
    3 TxProtocol   c HTTP/1.1
    3 TxStatus     c 304
    3 TxResponse   c Not Modified
    3 TxHeader     c Server: Apache/2.2.22 (Debian)
    3 TxHeader     c ETag: "6338e-4c0f-54429328f4d3d"
    3 TxHeader     c Vary: Accept-Encoding
    3 TxHeader     c X-Varnish-Action: Caching external.acme.com
    3 TxHeader     c Accept-Ranges: bytes
    3 TxHeader     c Date: Thu, 22 Dec 2016 09:21:53 GMT
    3 TxHeader     c X-Varnish: 1305584183
    3 TxHeader     c Age: 0
    3 TxHeader     c Via: 1.1 varnish
    3 TxHeader     c Connection: keep-alive
    3 Length       c 0
    3 ReqEnd       c 1305584183 1482398513.625868082 1482398513.630025387 0.000263691 0.004073620 0.000083685
    3 SessionClose c EOF
    3 StatSess     c ::1 60278 0 1 1 0 1 1 290 0
    0 CLI          - Rd ping
    0 CLI          - Wr 200 19 PONG 1482398514 1.0
    0 CLI          - Rd ping
    0 CLI          - Wr 200 19 PONG 1482398517 1.0
    0 Backend_health - external Still healthy 4--X-RH 1 1 1 0.010621 0.010573 HTTP/1.1 200 OK

This is what I log when the remote goes down (but I still get the cached content as TTL has not been reached):

0 CLI          - Rd ping
0 CLI          - Wr 200 19 PONG 1482398997 1.0
0 Backend_health - external Went sick ------- 0 1 1 0.000000 0.010652
0 CLI          - Rd ping
0 CLI          - Wr 200 19 PONG 1482399000 1.0
3 SessionOpen  c ::1 60382 :6081
3 ReqStart     c ::1 60382 1305584190
3 RxRequest    c GET
3 RxURL        c /VirtualHostBase/https/...
3 RxProtocol   c HTTP/1.1
3 RxHeader     c Host: localhost:6081
3 RxHeader     c Pragma: no-cache
3 RxHeader     c Cache-Control: no-cache
3 RxHeader     c Upgrade-Insecure-Requests: 1
3 RxHeader     c User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36
3 RxHeader     c Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
3 RxHeader     c Accept-Encoding: gzip, deflate, sdch, br
3 RxHeader     c Accept-Language: it-IT,it;q=0.8,en-US;q=0.6,en;q=0.4,de;q=0.2
3 RxHeader     c Cookie: ...
3 RxHeader     c Via: 1.1 ...
3 RxHeader     c X-Forwarded-For: ...
3 RxHeader     c X-Forwarded-Host: ...
3 RxHeader     c X-Forwarded-Server: ...
3 RxHeader     c Connection: Keep-Alive
3 VCL_call     c recv lookup
3 VCL_call     c hash
3 Hash         c http://...
3 Hash         c ...
3 VCL_return   c hash
3 Hit          c 1305584185
3 VCL_call     c hit deliver
3 VCL_call     c deliver deliver
3 TxProtocol   c HTTP/1.1
3 TxStatus     c 200
3 TxResponse   c OK
3 TxHeader     c Server: Apache/2.2.22 (Debian)
3 TxHeader     c Access-Control-Allow-Origin: *
3 TxHeader     c Last-Modified: Wed, 21 Dec 2016 11:10:16 GMT
3 TxHeader     c ETag: "6338e-4c0f-54429328f4d3d"
3 TxHeader     c Vary: Accept-Encoding
3 TxHeader     c Content-Encoding: gzip
3 TxHeader     c Content-Type: text/html
3 TxHeader     c X-Varnish-Action: Caching external.acme.com
3 TxHeader     c Content-Length: 4367
3 TxHeader     c Accept-Ranges: bytes
3 TxHeader     c Date: Thu, 22 Dec 2016 09:30:03 GMT
3 TxHeader     c X-Varnish: 1305584190 1305584185
3 TxHeader     c Age: 39
3 TxHeader     c Via: 1.1 varnish
3 TxHeader     c Connection: keep-alive
3 Length       c 4367
3 ReqEnd       c 1305584190 1482399003.173985004 1482399003.174190044 0.000112534 0.000087976 0.000117064
3 SessionClose c EOF
3 StatSess     c ::1 60382 0 1 1 0 0 0 441 4367
0 CLI          - Rd ping
0 CLI          - Wr 200 19 PONG 1482399003 1.0
0 CLI          - Rd ping
0 CLI          - Wr 200 19 PONG 1482399006 1.0

And finally this the error I get after TTL over (after more than a minute)

    0 CLI          - Rd ping
    0 CLI          - Wr 200 19 PONG 1482399282 1.0
    0 CLI          - Rd ping
    0 CLI          - Wr 200 19 PONG 1482399285 1.0
   12 SessionOpen  c ::1 60412 :6081
   12 ReqStart     c ::1 60412 1305584191
   12 RxRequest    c GET
   12 RxURL        c /VirtualHostBase/https/...
   12 RxProtocol   c HTTP/1.1
   12 RxHeader     c Host: localhost:6081
   12 RxHeader     c Cache-Control: max-age=0
   12 RxHeader     c Upgrade-Insecure-Requests: 1
   12 RxHeader     c User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36
   12 RxHeader     c Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
   12 RxHeader     c Accept-Encoding: gzip, deflate, sdch, br
   12 RxHeader     c Accept-Language: it-IT,it;q=0.8,en-US;q=0.6,en;q=0.4,de;q=0.2
   12 RxHeader     c Cookie: ...
   12 RxHeader     c If-None-Match: "6338e-4c0f-54429328f4d3d"
   12 RxHeader     c If-Modified-Since: Wed, 21 Dec 2016 11:10:16 GMT
   12 RxHeader     c Via: 1.1 ...
   12 RxHeader     c X-Forwarded-For: ...
   12 RxHeader     c X-Forwarded-Host: ...
   12 RxHeader     c X-Forwarded-Server: ...
   12 RxHeader     c Connection: Keep-Alive
   12 VCL_call     c recv pass
   12 VCL_call     c hash
   12 Hash         c http://...
   12 Hash         c ...
   12 VCL_return   c hash
   12 VCL_call     c pass pass
   12 FetchError   c no backend connection
   12 VCL_call     c error deliver
   12 VCL_call     c deliver deliver
   12 TxProtocol   c HTTP/1.1
   12 TxStatus     c 503
   12 TxResponse   c Service Unavailable
   12 TxHeader     c Server: Varnish
   12 TxHeader     c Content-Type: text/html; charset=utf-8
   12 TxHeader     c Retry-After: 5
   12 TxHeader     c Content-Length: 419
   12 TxHeader     c Accept-Ranges: bytes
   12 TxHeader     c Date: Thu, 22 Dec 2016 09:34:48 GMT
   12 TxHeader     c X-Varnish: 1305584191
   12 TxHeader     c Age: 0
   12 TxHeader     c Via: 1.1 varnish
   12 TxHeader     c Connection: close
   12 Length       c 419
   12 ReqEnd       c 1305584191 1482399288.281390190 1482399288.281631470 0.000269890 0.000148058 0.000093222
   12 SessionClose c error
   12 StatSess     c ::1 60412 0 1 1 0 1 0 257 419
    0 CLI          - Rd ping
    0 CLI          - Wr 200 19 PONG 1482399288 1.0
    0 Backend_health - external Still sick ------- 0 1 1 0.000000 0.010652
    0 CLI          - Rd ping
    0 CLI          - Wr 200 19 PONG 1482399291 1.0
1

There are 1 answers

4
alejdg On

I believe your problem lies on the line return (restart); in vcl_fetch. The restart will force Varnish to process the request from the beginning with a different backend, as you don't have another one this will raise an error. This is the cause that your object is served from the cache for not more than its TTL.

This would be the correct flow:

  1. TTL valid => deliver from cache;

  2. TTL invalid => try to fetch from backend;

  3. Backend not responding => use grace TTL;

  4. Grace TTL valid => deliver from staled cache;

The restart is changing this flow by not letting Varnish get to step 4. Try removing it and see the magic happen.