I have a cloud streaming pipeline that read from PubSubIO and which "PipelineOptions" are set with "WorkerMachineType = n1-standard-1". This machine have 3.75GB of memory.
My problem is that if the subscription has a lot of messages, the pipeline reads really fast and when starts to process many elements it doesn't have enough memory.
Is there any form to reduce the quantity of messages read per second? or is the memory consumption related with the time duration assigned to the window and I would reduce this time duration?
Thanks is advance.
It sounds like you may be trying to process too much data with too few workers. We are looking at addressing this and related scenarios, but in the meantime you may want to try dialing down the amount of data you're ingesting, or increasing the number of workers available to the jobs.
You'll also get better performance with n1-standard-4 machines, which is why we make those the default for the streaming runner.