Currently we have a use case where we want to process some messages at later point of time, after some conditions met. Is it possible to unacknowledge some pub/sub messages in apache beam pipeline which will be later available after visibility time out which we can process later?
Unacknowledge some pub/sub messages in apache beam pipeline
389 views Asked by Balasubramanian Naagarajan AtThere are 2 answers
As an alternative to @guillaume's suggestion, you can also store the "to-be-processed-later" messages (in raw format) in storage mediums such as BigQuery or Cloud Bigtable. All the messages will be acked by the pipeline and then the segregation can be done inside the pipeline where the "valid" messages are processed as usual while the "invalid" messages are preserved in storage for future processing.
Once the processing conditions are satisfied, the "invalid" messages can be retrieved from the storage medium and processed after which they can be deleted from storage. This could be a viable solution if the "invalid" messages will be processed after the message retention period which is 7 days.
The above workflow is inspired by this section of the Google Cloud blog. I considered the "invalid" messages to be "bad" data.
You can't unack the message with Apache beam. When the message are correctly ingested in the pipeline, they are acked automatically.
You can keep them in the pipeline and reprocess them until the conditions are met. But you could have a congestion, or an overusage of Dataflow resources for nothing. It could be better to clean the message before, on a Cloud Functions for instance, that unack the message when they aren't valid, and publish in a target PubSub topic the valid messages.