I want to compute with Storm the mean from incoming tuples made of [int id,int value]. As you can see i can't partition the data by using a fields grouping. I need a topology architecture to distribute this computation and the only way im thinking of is doing mini batches within each bolt instances and then aggregate.
I kind of understood that trident was the appropriate solution to do mini-batch processing within storm.
What is the best practice to compute global analytics with storm like means, global count, std-devs when you can't partition the data based on attribute? Any topology example?
You can easily compute stream statistics such as mean, standard deviation and count computed using Trident-ML. There's a section in the README which explains how to compute theses stats within a trident topology.
Hope it helps.