state saving of storm bolts which do periodic aggregation and save agg result to db

761 views Asked by At

I’ve seen several simple aggregation examples on the web. Can’t find one that answers my question though. I’m wondering if Zookeeper saves the states of bolts so if 1 aggregation bolt crashes, then when it restarts the worker the worker will start from previous state. I use acks (and might do batch processing as well.)

For example, let’s say I have to count every minute how many words of the same type I find and store them in a db. My bolt would keep counters for each work and at the end of every minute dump the counters it holds in memory to db.

eg: input: The peanut is great. The ocean is great.
Bolt state after input is processed:
the:2
peanut:1
is:2
great:2
ocean:1

(I hope I don't need Trident for this.)
So if the bolt crashes before it commits to db the counters, does Zookeeper save that state?
If not, then do you have suggestions/links on what the best way to do this is?

Thanks

1

There are 1 answers

2
Vishal John On BEST ANSWER

Zookeeper is used to co-ordinate the nodes in the cluster. I dont think it is used to save the internal state of bolts. Unfortunately i cannot find a link where it is explicitly mentioned

Also you should take care of common problems while designing these sort of "aggregator" topologies. Suppose bolt A is handling the word "The" and B is handling word ocean. Assume that you spout crashed just afte emitting "The ocean is great". Bolt 'A' would have recived the word "The" and incremented it while 'B' never received any input.

Now when the spout is back and it sends "The ocean is great" again, bolt A should not over count the word "The". This logic will have to be handled by the application developer.

Trident takes care of these situations using transaction ids. Its worth to take a look at it.

Please have a look at these wikis -

  1. https://github.com/nathanmarz/storm/wiki/Trident-state

  2. https://github.com/nathanmarz/storm/wiki/Transactional-topologies

You will get some insights on how to design your topology.