Sync severals spouts and bolt in Heron

42 views Asked by At

I am using Heropy. My topology looks the next way:

Spout1 -> Bolt1 - > Bolt2

Spout2 ---------> Bolt2

So, Bolt2 needs info from both Spout2 and Bolt1 in order to emit the result, however the info from Spout2 comes faster than the info from Bolt1 and Bolt2 has to wait until if has both Spout2 and Bolt1 info. How can I sync information flow using Heron API so that Bolt2 emits the result only after all the info available?

2

There are 2 answers

0
Neng On

Heron doesn't synchronize tuples from different components automatically. So you will need to buffer the tuples from Spout2 and wait until the corresponding tuples from Bolt1 arrive, then do the computation.

0
Ning Wang On

In general the order/synchronization is not guaranteed in streaming. It is hard for spout1 and spout2 themselves to be 100% synchronized I feel.

I am thinking there is one options you might consider: hooking bolt1 to spout2 and emiting the tuples from spout2 directly, disconnecting bolt2 from spout2. So that bolt1 becomes the source of truth of the ordering.

Windowing might be another option, but it requires more considerations and works.