I'm trying to submit a spark application using spark operator and to expose metrics using JMX exporter. I'm using Spark 3.1.1 & spark operator v1beta2-1.3.3-3.1.1 Here is a snippet from the configuration.
monitoring:
exposeDriverMetrics: true
exposeExecutorMetrics: true
prometheus:
jmxExporterJar: "/opt/spark/jars/jmx_prometheus_javaagent-0.11.0.jar"
port: 8090
configFile: "/opt/spark/work-dir/prometheus/prometheus.yaml"
driver:
cores: 1
coreLimit: "1200m"
memory: "2g"
javaOptions: "-Dconfig.file=/opt/spark/work-dir/conf/application_app.conf -Dlog4j.configuration=file:///opt/spark/work-dir/log/log4j_app.properties"
labels:
version: 3.1.1
serviceAccount: spark-jobs-spark
The application doesn't expose any metrics from the driver. below you can find prometheus yaml:
# These come from the application driver if it's a streaming application
# Example: default/streaming.driver.com.example.ClassName.StreamingMetrics.streaming.lastCompletedBatch_schedulingDelay
- pattern: metrics<name=(\S+)\.(\S+)\.driver\.(\S+)\.StreamingMetrics\.streaming\.(\S+)><>Value
name: spark_streaming_driver_$4
labels:
app_namespace: "$1"
app_id: "$2"
- pattern: metrics<name=(\S+)\.(\S+)\.spark.streaming.(\S+)\.(\S+)><>Value
name: streaming_query
labels:
job: "$1"
instance: "$2"
query: "$3"
name: "$4"
# These come from the application driver if it's a structured streaming application
# Example: default/streaming.driver.spark.streaming.QueryName.inputRate-total
- pattern: metrics<name=(\S+)\.(\S+)\.driver\.spark\.streaming\.(\S+)\.(\S+)><>Value
name: spark_structured_streaming_driver_$4
labels:
app_namespace: "$1"
app_id: "$2"
query_name: "$3"
# These come from the application executors
# Example: default/spark-pi.0.executor.threadpool.activeTasks
- pattern: metrics<name=(\S+)\.(\S+)\.(\S+)\.executor\.(\S+)><>Value
name: spark_executor_$4
type: GAUGE
labels:
app_namespace: "$1"
app_id: "$2"
executor_id: "$3"
# These come from the application driver
# Example: default/spark-pi.driver.DAGScheduler.stage.failedStages
- pattern: metrics<name=(\S+)\.(\S+)\.driver\.(BlockManager|DAGScheduler|jvm)\.(\S+)><>Value
name: spark_driver_$3_$4
type: GAUGE
labels:
app_namespace: "$1"
app_id: "$2"
# These come from the application driver
# Emulate timers for DAGScheduler like messagePRocessingTime
- pattern: metrics<name=(\S+)\.(\S+)\.driver\.DAGScheduler\.(.*)><>Count
name: spark_driver_DAGScheduler_$3_count
type: COUNTER
labels:
NOTE: Here is an interesting part. If I'm using spark-shell with javaagent defined with jmx exporter I'm able to find the metrics but not from spark-submit. What am I missing?
I tried to run a jmx exporter with spark-shell inside the pod and it worked. I have no idea why it's not working from spark-submit.