Is there any limit on redirects in StormCrawler?

192 views Asked by At

I can see the _redirTo tag in Status index of ElasticSearch. A few questions regarding redirection as follows :

  1. Any limit on redirection ? so that it should not end in loop of redirects ?
  2. How many redirects of particular fetched URL ? I can see only one redirect in _redirTo tag which is immediate one. Cannot get count of redirects if there are two or three redirects of URL ?
1

There are 1 answers

7
Julien Nioche On BEST ANSWER

You can set a limit to the depth from the seed, see MaxDepth URL filter but not directly on the number of successive redirections.

As you noticed, we track only the URL a given document is redirected to.

If you wanted to control the number of redirs regardless of the distance from the seed, one way would be to extend or modify MetadataTransfer or handle the redirs within the protocol implementation, the downside being that this will not check whether the target URL has already been fetched.

UPDATE There is a config element called 'redirections.allowed' with a default value of true. I've just pushed a fix for SimpleFetcherBolt as it wasn't handled properly.