It's all in the title. I'd like to run batches off the top of my streaming jobs, and being able to see the watermark as an indicator of when to start would be wonderful.
Is there anyway to poll the system watermark of a running data flow pipeline?
217 views Asked by mr blobby At
1
You might be able to accomplish this by using pubsub to publish a signal that would trigger what ever external processing you want.
To control the frequency of that signal you could use a ParDo to filter down your records based on some criterion which might take into account the timestamps of the event.
If you explicitly want to use the watermark you could try to use windowing and triggers to produce records after the watermark passes some interval.
I don't think there is any explicit way to access the watermark.