crawler4j not working while using it with TimerTask

263 views Asked by At

We have been trying to use the crawler so that we can crawl a particular website at a certain interval. For this we have been trying to incorporate the crawler in timer. But after the first successful crawling using the timer, it always says in the console:

It looks like no thread is working, waiting for 10 seconds to make sure... No thread is working and no more URLs are in queue waiting for another 10 seconds to make sure... All of the crawlers are stopped. Finishing the process... Waiting for 10 seconds before final clean up... CrawlerScheduler finished at:Wed Nov 19 18:41:36 IST 2014

for every subsequent crawling using the timer. The crawler is not working again. We went through the source code to figure out the reason, but failed.

here is the code:

public class CrawlerScheduler extends TimerTask {

@Override
public void run() {
    try {
        System.out.println("CrawlerScheduler started at:"+new Date());
        int numberOfCrawlers = 1;
        String crawlStorageFolder = ".";       
        CrawlConfig crawlConfig = new CrawlConfig();
        crawlConfig.setCrawlStorageFolder(crawlStorageFolder);
        PageFetcher pageFetcher = new PageFetcher(crawlConfig);       
        RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
        RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
        CrawlController controller = new CrawlController(crawlConfig, pageFetcher, robotstxtServer);       
        controller.addSeed("http://wwwnc.cdc.gov/travel/destinations/list");       
        controller.start(Crawler.class, numberOfCrawlers);           
        System.out.println("CrawlerScheduler finished at:"+new Date());
    } catch (Exception ex) {
        Logger.getLogger(CrawlerScheduler.class.getName()).log(Level.SEVERE, null, ex);
    }
}

public static void main(String ar[]){
    TimerTask timerTask = new CrawlerScheduler();
    Timer timer = new Timer();
    timer.scheduleAtFixedRate(timerTask,10,6*60*1000); 
    try {
        Thread.sleep(3000);
    } catch (InterruptedException ex) {
        Logger.getLogger(CrawlerScheduler.class.getName()).log(Level.SEVERE, null, ex);
    }
}

}

0

There are 0 answers