How can I generate histogram on a big bounded dataset with Apache Beam?

58 views Asked by doc_56 At 19 February 2024 at 16:27

I'm writing an Apache Beam pipeline that transforms a raw dataset in a canonical schema defined with Google's Protocol Buffers, then I compute some metrics for each data instance and save them to the Proto object too.

Now for each computed metrics I want to extract an histogram that describes the distribution of the metric across the dataset. How can I do that in Beam?

I see that there's an histogram metric implementation in the Python SDK but it is only for internal use and it is not supported by the runners. Is there any workaround to this?

Original Q&A

TechQA.

How can I generate histogram on a big bounded dataset with Apache Beam?

There are 0 answers

Related Questions in PYTHON

Related Questions in APACHE-BEAM

Related Questions in APACHE-BEAM-INTERNALS

Popular Questions

Trending Questions