Can CDN create some kind of statistics by tracking visitors of my site or they download needed libraries without sharing the URL of the page they visit ?

3

There are 3 answers

7
AudioBubble On BEST ANSWER

Yes, they can use the referer header field:

The HTTP referer (originally a misspelling of referrer) is an HTTP header field that identifies the address of the webpage (i.e. the URI or IRI) that linked to the resource being requested. By checking the referrer, the new webpage can see where the request originated.

The field is part of the request header which could look like this, for example, reloading this page will show that this link to googleapis (see console F12 and the network tab):

http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js

Sent this request header:

Host: ajax.googleapis.com
User-Agent: Mozilla/5.0 (...)
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Referer: http://stackoverflow.com/questions/30743915/does-cdn-know-which-website-the-client-is-visiting-when-fetching-jquery-min-js-o
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache

Reporting which site it came from using the referer:

Referer: http://stackoverflow.com/questions/30743915/does-cdn-know-which-website-the-client-is-visiting-when-fetching-jquery-min-js-o
2
Stefano Sanfilippo On

Yes, they know the URL of the page which requested the resource (e.g. by looking at the Referer header). So they could track what websites requested a certain resource. The only exception is when an HTTPS page requests a resource over a non-secure connection. In that case the Referer won't be set, but the Origin header could be of some help anyway.

Tracking individual users could certainly be done, but it's impractical for a number of reasons:

  1. CDN resources are meant to be heavily cached by browsers, so they will be requested and downloaded once for many different page views, making "passive" stats bogus.

  2. Forcing the user to download the resource for each page visited makes CDNs pointless, slows down the navigation for no reason and overloads CDN's bandwidth. This was the technique used by long-dead views counters on GeoCities pages from the 90s (sigh).

  3. Tracking users requires setting an identifying cookie at least. This adds complexity to the web service (since it can't be a simple file server anymore) and latency to the response time, since the UID has to be looked up in some form of DB or newly generated. Etags could be abused as well, with the same issues of cookies.

  4. As an alternative, using query string parameters could work, but requires collaboration from the target page, which has to include the UID as a parameter to each request, which means URLs cannot be hard-coded. I guess this is not the case you are talking about.

To sum up, a CDN could track your visitors, but the downsides of doing so are actually larger than the hypothetical gain, assuming the performance and the linked profitability is the main goal of running a CDN. If analytics are more valuable than performance or economy of operation, like it could be for a free CDN, then one could sacrifice performance for gaining insights by applying points 2 and 3.

At that point, one would have to demonstrate the soundness of collected stats in order to be able to sell them for any marketing purpose. Besides, the nature of the files usually served by CDNs make them quite uninteresting. For instance, I don't see a lot of profitability in knowing how many people use a certain version of jQuery out there.

0
lawazoni On

<meta name="referrer" content="same-origin"> in this case no referrer will be sent to the CDN