getting too soon Google CDN cache eviction or bug

419 views Asked by At

Im checking CDN hit/miss from cache, using same PC, same client address and requesting same URL:

***Scenario 1 Cache-Control 1day, 1month, 1 year:
-Hour 12:00
user1 Request URL it not found, cache filled.
-Hour 12:05
user1 Request URL found, cache HIT response.
-Hour 12:10
user1 Request URL not found, cache filled.

***Scenario 2 (using same internet gateway) Cache-Control 1day, 1month, 1 year
-Hour 12:00
At Building Organization User1 Request URL, Url not found but on second request cache hit

-Hour 12:01
At same Building Organization User2 Request same URL, and voila again Url not found but on second request cache hit

***Scenario 3 (using same internet gateway) Cache-Control 1day, 1month, 1 year
-Hour 12:00 At Building Organization User1 Request URL using Edge Browser, Url not found but on second request cache hit then same user on same PC, open Chrome or Firefox Request URL and voila again Url not found and cache again need to fill

Why cache out very soon even if cache-control set for 1day, 1month or 1year, or if use diferent browser? this is a bug?.

2

There are 2 answers

2
Anurag Sharma On

Cache modes 1 controls the factors that determine whether and how CDN caches your content. For example if you are using USE_ORIGIN_HEADERS as the cache mode then we should be looking at values mentioned against max-age and s-maxage mentioned in the response to check the TTL of the cached content. Since s-maxage overrides max-age, we shall look at the configured value for s-maxage. If we look at the best practices, it is advised to keep this value a bit large so that the content in the cache does not expire soon.

Also to maximise performance of Google CDNs, we need to increase the amount of incoming requests per url.

Now lets consider an example where a user is using a HTTPS load balancer with Cloud run endpoints in europe-west1. This means that request to certain URL can go to either endpoints in either zone: europe-west1-a/b/c. The request first reaches primary GFEs near to the user and then the secondary GFEs available in each zone if primary GFEs do not have the requested data in their cache.

Now a new request will hit the primary GFEs nearer to the user and the data will get cached in that GFEs cache after contacting the backend, provided no data related to the request was found in that GFEs cache. Now there is a high possibility that primary GFE used for the first request might not be used again for the second request. For data to be served from primary GFE's cache, the request should have entered all the primary GFEs nearer to the user. Scenarios wherein the data was not present in primary GFEs cache, the request goes to secondary GFEs available in the region. Let us say the second request went to secondary GFE and that did not have any data pertaining to that URL in its cache. In that case the request will go the backend. Now considering for the third request, a primary GFE chose another secondary GFE within a zone (not the one mentioned previously), which did not have entry, then again the request will go to the backend. Now there can be a scenario wherein the first few requests forwarded by primary GFE were to a different secondary GFEs everytime, which did not have data pertaining to that URL in their cache and all the requests had to be forwarded to the backend.

Also referring to your concern about hitting cache from same browser, this behaviour is intended because Google Cloud uses anycast virtual IPs to load balance CDN traffic (which also explains the behaviour mentioned in the example above). Some of the other CDN providers load balance at the DNS level so all requests goes to the same edge server.

3
elving On

It's not a bug. In many metropolitan areas, Google Cloud CDN operates multiple caches. If you check the logs for the cache misses in your example, you will likely find the requests were served by distinct caches. You won't get cache hits from a particular cache until that cache has had a chance to cache the content.

cloud.google.com/cdn/docs/logging describes how to view log entries. In each log entry, the cacheId field identifies which cache served the response. Even once a response is cached, max-age and s-maxage specify only the maximum amount of time that response can be used. There's more information about expiration and eviction at cloud.google.com/cdn/docs/overview#eviction_and_expiration.