How to perform multiple window aggregation with update in Flink?

953 views Asked by At

I have a use case wherein I'm receiving a stream of events containing different sets of information and want to perform aggregations on them. For each of these aggregations, there are multiple tumbling windows which are needed eg: Daily, Weekly, Monthly, Yearly etc.
The aggregations initially are basic addition of the counts seen but could later be some analytics/joins handling across these events. So if an event A comes once everyday and another event B comes once every week, the result would be something like this:

Daily
     A: 1
     B: 1 (Only for the day it was received)
Weekly
     A: 7
     B: 1
Monthly
     A: 30 (30 day month)
     B: 4 (5 in some cases)
Yearly
     A: 365
     B: 52 (53 in some cases)

The usecase is only around tumbling windows and not sliding windows and I'm looking at how to implement this usecase. The main problem is that I don't want to wait until the end of the window and want to keep receiving updates every 10 minutes or so.
I took a look at flink and there are some ways we could do it such as using a ProcessWindow function, incremental aggregation, stream slicing, broadcasting state etc. but since I'm pretty new to flink I'm not completely sure on what to use and if there are any pitfalls I'm missing.

Would be great if anyone could help me out.

1

There are 1 answers

0
David Anderson On

The choices for implementing windows on Flink are

  1. Flink SQL
  2. the DataStream Window API
  3. a ProcessFunction

I don't think your requirement to produce updates every 10 minutes is a good fit for SQL.

As for the Window API, the built-in TimeWindow window assigner doesn't support months and years, and the requirement to product updates every 10 minutes requires a custom Trigger. With enough effort you could overcome these limitations, but I don't think it's worth it.

I would instead implement this using a ProcessFunction. The training that is embedded the Flink docs has an example of how to use a process function to implement tumbling time windows that you can use as a starting point. Extending that example to meet your requirements shouldn't be very difficult.