How to Control Size of Flink Checkpoints

570 views Asked by At

I am running a simple Flink aggregation job which consumes from Kafka and applies multiple windows(1 hr, 2 hr...upto 24 hours) with specific sliding interval and does the aggregation on windows. Sometimes the job restarts and we loose the data as it starts the window from latest Kafka data. To overcome we have enabled checkpointing and I see the checkpointing size increasing(configs : HashMapStateBackend with Hdfs storage). What are best ways to checkpoint for a forever running Flink job and can we control the size of checkpoints as it will be huge after a few days run ??

Tried enabling checking pointing with HashMapStateBackend with Hdfs storage.

1

There are 1 answers

2
kkrugler On

The Flink window code should clean up state after a window has expired. Note that this is based on your workflow running in event time mode, and supplying appropriate watermarks. Also if you configure a "max lateness" then the actual wall clock time when the window state is removed is based on both the watermark timestamps and this max lateness.

Separately, you'll have window state for each sliding window x each unique key. So if you have say a 1 minute sliding window with a 24 hour duration then you'll have (1440 x # of unique keys) windows, which can cause an explosion in the size of your state.