Using Kafka topic for feeding seeds url to Storm Crawler

143 views Asked by At

We want to feed seed URLs from a Kafka topic to the StormCrawler based project. Is there a need to change the Storm Crawler?

1

There are 1 answers

0
Julien Nioche On

Obviously, you'd need to change the topology a bit and add a KafkaSpout and connect it to the StatusUpdaterBolt; like we do in the ES archetype with the FileSpout. The KafkaSpout will have to generate the same sort of output as the FileSpout for the status stream i.e. URL, metadata and status (with a value of discovered). If that's difficult, you can insert a bolt between the Kafka Spout and the statusupdater bolt to convert from strings to that output