Background: ETag tracking is well explained here and also mentioned on Wikipedia.
An answer I wrote in a response to "How can I prevent tracking by ETags?" has driven me to write this question.
I have a browser-side solution which prevents ETag tracking. It works without modifying the current HTTP protocol. Is this a viable solution to ETag tracking?
Instead of telling the server our ETag we ASK the server about its ETag, and we compare it to the one we already have.
Pseudo code:
If (file_not_in_cache)
{
page=http_get_request();
page.display();
page.put_in_cache();
}
else
{
page=load_from_cache();
client_etag=page.extract_etag();
server_etag=http_HEAD_request().extract_etag();
//Instead of saying "my etag is xyz",
//the client says: "what is YOUR etag, server?"
if (server_etag==client_etag)
{
page.display();
}
else
{
page.remove_from_cache();
page=http_get_request();
page.display();
page.put_in_cache();
}
}
HTTP conversation example with my solution:
Client:
HEAD /posts/46328
host: security.stackexchange.com
Server:
HTTP/1.1 200 OK
Date: Mon, 23 May 2005 22:38:34 GMT
Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux)
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
ETag: "EVIl_UNIQUE_TRACKING_ETAG"
Content-Type: text/html
Content-Length: 131
Case 1, Client has an identical ETag:
Connection closes, client loads page from cache.
Case 2, client has a mismatching ETag:
GET...... //and a normal http conversation begins.
Extras that do require modifying the HTTP specification
Think of the following as theoretical material, the HTTP spec probably won't change any time soon.
1. Removing HEAD overhead
It is worth noting that there is minor overhead, the server has to send the HTTP header twice: Once in response to the HEAD, and once in response to the GET. One theoretical workaround for this is modifying the HTTP protocol and adding a new method which requests header-less content. Then the client would request the HEAD only, and after that the content only, if the ETags mismatch.
2. Preventing cache based tracking (or at least making it a lot harder)
Although the workaround suggested by Sneftel is not an ETag tracking technique, it does track people even when they're using the "HEAD, GET" sequence I suggested. The solution would be restricting the possible values of ETags: Instead of being any sequence, the ETag has to be a checksum of the content. The client checks this, and in case there is a mismatch between the checksummed value and the value sent by the server, the cache is not used.
Side note: fix 2 would also eliminate the following Evercookie tracking techniques: pngData, etagData, cacheData. Combining that with Chrome's "Keep local data only until I quit my browser" eliminates all evercookie tracking techniques except Flash and Silverlight cookies.
It sounds reasonable, but workarounds exist. Suppose the front page was always given the same etag (so that returning visitors would always load it from cache), but the page itself referenced a differently-named image each time it was loaded. Your GET or HEAD request for this image would then uniquely identify you. Arguably this isn't an etag-based attack, but it still uses your cache to identify you.