How can it scan all available pages automatically?
One way I can think of is to scan it recursively from the home page.
But it won't be able to scan out the back end CMS .
So how do those scanning tools work?
How can it scan all available pages automatically?
One way I can think of is to scan it recursively from the home page.
But it won't be able to scan out the back end CMS .
So how do those scanning tools work?
Stupid web crawler:
Start by creating an array to store links, and putting one URL in there yourself. Create a second empty array to store visited URLs. Now start a program which does the following.
If you assume that every page on the web is reachable by following some number of random links (possibly billions), then simply repeating steps 1 through 4 will eventually result in downloading the entire web. Since the web is not actually a fully connected graph, you have to start the process from different points to eventually reach every page.