Is it possible to pause and resume crawling using Java crawler crawler4j?

Question

Is it possible to pause and resume crawling using Java crawler crawler4j?

292 views Asked by Milan Verescak At 16 October 2017 at 11:19

I already know that you can configure crawling to be resumable.

But is it possible to use resumable functionality to pause crawling process and then resume crawling later programmatically? E.g. I can gracefully shutdown crawling with shutdown method of the crawler and with resumable parameter set to true, then start again crawling.

Will it work this way, because primary purpose of resumable parameter is to handle accidental crashes of crawler. Is there any other or better way how to achieve this functionality with crawler4j?

Original Q&A

There are 1 answers

**rzo1** · Answer 1 · 2018-01-26T13:17:02+00:00

If you set the parameter resumable to true, the Frontier as well as the DocIdServer will store their queues on the user-defined storage folder.

This works either for a crash or for a programmatic shutdown. In both cases, the storage folder must be the same.

See also the related issue on the offical issue tracker

TechQA.

Is it possible to pause and resume crawling using Java crawler crawler4j?

There are 1 answers

Related Questions in JAVA

Related Questions in WEB-SCRAPING

Related Questions in WEB-CRAWLER

Related Questions in CRAWLER4J

Popular Questions

Trending Questions