I have a very interesting requirement that I am not too sure of the answer. I am turning to Stack Overflow in the hope that someone is able to share their experiences and propose a solution.
Setup
I have a front facing website that is powered by Ghost running a standard MEAN stack enviorment and all traffic is handled via CloudFlare.
Problem
I have become aware recently that I have been receiving a large amount of requests via the CloudFlare display that do not appear in my Google Analytics. I am aware that some people may have JS disabled, however we are talking orders of magnitude difference between the two. I would very much like to know why.
Hypothesis
I suspect that person(s) are trying to use port scanning, or attempt to find vulnerabilities in my platform. Or it could be a simple case of linking going astray. Either way, I am not sure.
Solutions
This is the part I am not sure about. What would be the best approach to record and retain HTTP requests? One consideration I have had is to use Morgan to to filestream requests into a .log file and review at a later date. However, I wonder if there is a more elegant solution.
I welcome any thoughts you may have.
Thanks
Google Analytics is a fair bit more conservative than Cloudflare. One reason, as you mentioned is that Cloudflare is able to access raw HTTP logs, instead of having to use JavaScript to identify page views. As Cloudflare only marks HTTP requests, port scanning would not be recorded as a hit.
However, even with bots accounted for, Cloudflare may still record views which Google Analytics can't, for example; AJAX content requests. As the Google Analytics beacon is only run once when the page is loaded, Google Analytics only records this once - Cloudflare sees this as 2 HTTP requests in it's raw logs.
For details, please see the following blog post, it goes into detail as to how Google Analytics and Cloudflare Analytics can differ: Understanding Analytics: When Is a Page View Not a Page View?