How to validate that a certain domain is reachable from browser?

233 views Asked by At

Our single page app embeds videos from Youtube for the end-users consumption. Everything works great if the user does have access to the Youtube domain and to the content of that domain's pages.

We however frequently run into users whose access to Youtube is blocked by a web filter box on their network, such as https://us.smoothwall.com/web-filtering/ . The challenge here is that the filter doesn't actually kill the request, it simply returns another page instead with a HTTP status 200. The page usually says something along the lines of "hey, sorry, this content is blocked".

One option is to try to fetch https://www.youtube.com/favicon.ico to prove that the domain is reachable. The issue is that these filters usually involve a custom SSL certificate to allow them to inspect the HTTP content (see: https://us.smoothwall.com/ssl-filtering-white-paper/), so I can't rely TLS catching the content being swapped for me with the incorrect certificate, and I will instead receive a perfectly valid favicon.ico file, except from a different site. There's also the whole CORS issue of issuing an XHR from our domain against youtube.com's domain, which means if I want to get that favicon.ico I have to do it JSONP-style. However even by using a plain old <img> I can't test the contents of the image because of CORS, see Get image data in JavaScript? , so I'm stuck with that approach.

Are there any proven and reliable ways of dealing with this situation and testing browser-level reachability towards a specific domain?

Cheers.

1

There are 1 answers

0
jeffmcc On

In general, web proxies that want to play nicely typically annotate the HTTP conversation with additional response headers that can be detected.

So one approach to building a man-in-the-middle detector may be to inspect those response headers and compare the results from when behind the MITM, and when not.

Many public websites will display the headers for a arbitrary request; redbot is one.

So perhaps you could ask the party whose content is being modified to visit a url like: youtube favicon via redbot.

Once you gather enough samples, you could heuristically build a detector.

Also, some CDNs (eg, Akamai) will allow customers to visit a URL from remote proxy locations in their network. That might give better coverage, although they are unlikely to be behind a blocking firewall.