[DESCRIPTION]
I am running Flink 1.11.1 on Kubernetes, and set up monitoring stack using Prometheus and Grafana.
I have observed running WordCount example on Flink Cluster (submitted via UI) does not return $(job_name)
on prometheus.
To troubleshoot, I have downloaded flink sample WordCount jobs and forced them to run longer using Thread.sleep()
. As seen in the below screenshot, I ran the job, then the longer version.
shorter and longer running screenshot
Only second run (longer jobs) export $(job_name)
field on prometheus as seen below on Grafana dashboard screenshot (label_values(job_name)
). That hints that shorter running jobs does not export given field.
job name field on Grafana dashboard
I have also tried to run pushgateway exporter with Flink's suggested settings which resulted in same result as above.
[QUESTION]
Is there a way to collect job_name metric from short running jobs, is my setup wrong? Or is it impossible to achieve it due to scrape interval on prometheus? Thank you.