Is it possible to pause and resume crawling using Java crawler crawler4j?

292 views Asked by At

I already know that you can configure crawling to be resumable.

But is it possible to use resumable functionality to pause crawling process and then resume crawling later programmatically? E.g. I can gracefully shutdown crawling with shutdown method of the crawler and with resumable parameter set to true, then start again crawling.

Will it work this way, because primary purpose of resumable parameter is to handle accidental crashes of crawler. Is there any other or better way how to achieve this functionality with crawler4j?

1

There are 1 answers

0
rzo1 On

If you set the parameter resumable to true, the Frontier as well as the DocIdServer will store their queues on the user-defined storage folder.

This works either for a crash or for a programmatic shutdown. In both cases, the storage folder must be the same.

See also the related issue on the offical issue tracker