I'm trying to integrate spark with prometheus. We have both spark 2 and spark 3. For spark 2 I know I can run jmx_exporter. Spark 3 has a new built in PrometheusServlet which is great. We are running spark on prem using YARN, not k8s.
My question is how do I dynamically discover prometheus scrape targets. As I understand it there's no static single central spark server to point to, instead each app gets packed into a yarn container and has its own metrics. Unless there is a way to aggregate these metrics (e.g. in spark history server) or have a static predictable address for each job?
When I submit a spark-streaming long running app, I'd like metrics for it to show up in Prometheus out of the box. I know the new PrometheusServlet has autodiscovery for k8s using annotations, I'd like to achieve something similar for yarn.
What I found so far:
- I could make prometheus scrape pushgateway and when running spark-submit have my app send metrics there. I found a custom sink that does that. however pushgateway introduces its own problems, so was hoping to avoid it.
- Use Prometheus file Service Discovery mechanism to add targets there. But how do I do that automatically without having to manually edit a json file every time I submit a new job? I found prometheus doesn't have an API to add targets and writing a job that would change a json file remotely when I run spark-submit feels kind of hacky.
Any suggestions for an elegant solution are welcome, thank you!