I am investigating using Flink with a Kinesis stream as a source. I would like to use Event Time watermarking. Planning on running this on AWS managed Flink (Kinesis Analytics) platform.
Looking at the AWS documentation and indeed Flink documentation it is recommended to use the FlinkKinesisConsumer.
To enable EventTime on this consumer I see that the recommendation is to use a custom AssignerWithPeriodicWatermarks()
and set it on the KinesisConsumer
with setPeriodicWatermarkAssigner
.
However, I also read on the Flink documentation that this API is deprecated and it advised to use WatermarkStrategies.
My questions:
- is it possible to use the WatermarkStrategy on the kinesis consumer or must it be applied after a non-source operation on the DataStream itself (discouraged in flink docs)?
- if not possible and must be used after a non-source operation what does this mean? Why is it discouraged? how does it will performance of the workload
- Or is it recommended to continue to use a deprecated API?
- or is there another kinesis flink consumer than can be recommended
Thanks in advance for any suggestions
Alexis
It's not possible to set it directly on
FlinkKinesisConsumer
(excepting, as you pointed out, by using the deprecatedAssignerWithPeriodicWatermarks
interface) nor any other implementation ofSourceFunction
, but you can set the watermarks as soon as you get the datastream.Judging from the comments I think you've already figured it out, but this is what I'm doing.
You'll have to tweak this to your use case. Obviously replace
MyDataType
with whatever is actually in your stream, but also note thatforMonotonousTimestamps
may not be suited to your use case, andforBoundedOutOfOrderness
may work better.