Apache Beam HTTP Unbounded Source Python

597 views Asked by At

Is it possible with the current version of Apache Beam to develop an unbounded source that receives data in a HTTP message? My intention is to run an HTTP Server and to inject the messages received into a Beam Pipeline. If it is possible, can it be done with the existing sources?

1

There are 1 answers

0
flomalb On

It is possible. you can develop it by leveraging Splittable DoFn. Source looks like they are going to be depreciated in the near future.

From my end, I am trying to develop such a pipeline that would consume a Rest API that is streaming Json messages in the get's body and supports multiple connections, hence splitting the workload on API side like Adobe Livestream or Twitter. This behaviour should enable scaling on the consumer end (Dataflow)

My struggle is that i can't figure out a splittable restriction out of this use case. The streaming is infinite and there is no Offset like in messaging brokers like Kafka or bytes range (files). I wanted first to build element restriction pairs like: (url,buffered reader) but i don't think buffered readers can be split.

One of the solutions might be not to provide a restriction at all. I am struggling to imagine how the pipeline would distribute elements hence scale.