I can see the _redirTo
tag in Status index of ElasticSearch. A few questions regarding redirection as follows :
- Any limit on redirection ? so that it should not end in loop of redirects ?
- How many redirects of particular fetched URL ? I can see
only one redirect in
_redirTo
tag which is immediate one. Cannot get count of redirects if there are two or three redirects of URL ?
You can set a limit to the depth from the seed, see MaxDepth URL filter but not directly on the number of successive redirections.
As you noticed, we track only the URL a given document is redirected to.
If you wanted to control the number of redirs regardless of the distance from the seed, one way would be to extend or modify MetadataTransfer or handle the redirs within the protocol implementation, the downside being that this will not check whether the target URL has already been fetched.
UPDATE There is a config element called 'redirections.allowed' with a default value of true. I've just pushed a fix for SimpleFetcherBolt as it wasn't handled properly.