I'm using apache storm 2.4.0 version and we want all our metrics need to be exposed. So, we created an service that accepts all the metrics in time being and will expose the metrics on /metrics end point. storm uses drop-wizard metrics and I'm converting them to prometheus metrics by the above process. This whole setup is done on kubernetes after making few changes to actual code to storm 2.4.0 version. About kubernetes set up, I have nimbus pod and supervisor pod and zookeeper pod differently. Those all are connected (Should these two supposed to be in one pod not sure).
topology.metrics.reporters:
# Prometheus Reporter
- class: "com.example.storm.PrometheusStormReporter"
daemons:
- "supervisor"
- "nimbus"
- "worker"
report.period: 60
report.period.units: "SECONDS"
storm.metrics.reporters:
# Prometheus Reporter
- class: "com.example.PrometheusStormReporter"
daemons:
- "supervisor"
- "nimbus"
- "worker"
report.period: 60
report.period.units: "SECONDS"
the above reporter will make a post call to our service where all the metrics are being populated.
I have converted drop-wizard metrics to prometheus metrics and making a POST call to our service. I have reduced the time to 5 seconds and 10 seconds from 60 seconds. But there is a large difference between what storm is processing and what we are seeing on /metrics end point.
One other way I tried is pushing the metric when I want in the execute method of the bolts(sequential with the my current logic). This way, I was adding exactly the metrics and move them accordingly. But this adds up to the processing time (which I don’t want). If posting fails I need to fail the tuple explicitly and add storm retry for this.
Do I need to shift to push-gateway? Or any other way I can get full metrics.