We have an openshift cluster in which the prometheus operator monitoring stack is installed. We would like to probe the actuator/health
endpoints of Spring Boot applications using blackbox exporter.
Here's what I've done so far:
Deployed blackbox exporter in the namespace we use for the prometheus operator. Service
and ConfigMap
is ready, a http_2xx
module is defined in the configMap
, the exporter is running. I have 2 namespaces (or projects) which have 1-1 application deployed in them, these are the same apps. I created a Probe
in one namespace and a serviceMonitor
in the other namespace. The probe uses a staticTarget
config to probe the target, the serviceMonitor uses labels to do this dynamically.
My problem is that every probe attempt fails.
The serviceMonitor log says the following:
`level=info msg="Invalid HTTP response status code, wanted 2xx" status_code=400`
I'm pretty sure this happens because these are https endpoints, but if I add a "scheme: https" line to the serviceMonitor config it just doesn't work.
The Probe says the following:
`level=error msg="Error for HTTP request" err="Get \"https://appIP:port/actuator/health\": tls: failed to verify certificate: x509: certificate signed by unknown authority"`
So far I only tried making the probe work, I have no clue what to do with the serviceMonitor.
I tried giving the probe a service ca to work with, did not work. I gave it the cert and key used by the app and it did not work, says the same.
Any idea what I should do? Configs below.
You'll notice Probe config does not have a ca object right now, but it gave the same log.
I'd really appreciate if someone could help me sort this out, it's driving me crazy :D
(note: tlsConfig: insecureSkipVerify: true
does not skip the verification process, which is weird)
Blackbox exporter yaml:
data:
blackbox.yaml: |
modules:
http_2xx:
http:
no_follow_redirects: true
method: GET
preferred_ip_protocol: ip4
valid_http_versions:
- HTTP/1.1
- HTTP/2
valid_status_codes: []
tls_config:
insecure_skip_verify: true
prober: http
timeout: 10s
serviceMonitor yaml:
spec:
endpoints:
- interval: 30s
params:
module:
- http_2xx
path: /probe
relabelings:
- action: replace
sourceLabels:
- __address__
targetLabel: __param_target
- action: replace
replacement: 'exporter:port'
targetLabel: __address__
- action: replace
sourceLabels:
- __param_target
targetLabel: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
scrapeTimeout: 10s
jobLabel: jobLabel
selector:
matchLabels:
app.kubernetes.io/component: component
Probe yaml:
spec:
interval: 30s
module: http_2xx
prober:
path: /probe
url: 'exporter.namespace.svc:port'
targets:
staticConfig:
static:
- 'https://app.namespace.svc:port/actuator/health'
tlsConfig:
cert:
secret:
key: key
name: secret-name
keySecret:
key: key
name: secret-name
Manually invoking blackbox exporter says this:
Logs for the probe:
ts=2023-12-07T10:24:46.576847865Z caller=main.go:181 module=http_2xx target=https://app.namespace.svc:port level=info msg="Beginning probe" probe=http timeout_seconds=119.5
ts=2023-12-07T10:24:46.576945405Z caller=http.go:328 module=http_2xx target=https://app.namespace.svc:port level=info msg="Resolving target address" target=app.namespace.svc ip_protocol=ip4
ts=2023-12-07T10:24:46.615450737Z caller=http.go:328 module=http_2xx target=https://app.namespace.svc:port level=info msg="Resolved target address" target=app.namespace.svc ip=IP_of_service
ts=2023-12-07T10:24:46.615543908Z caller=client.go:252 module=http_2xx target=https://app.namespace.svc:port level=info msg="Making HTTP request" url=https://IPaddress:port host=app.namespace.svc:port
ts=2023-12-07T10:24:46.624148963Z caller=handler.go:120 module=http_2xx target=https://app.namespace.svc:port level=error msg="Error for HTTP request" err="Get \"https://IPaddress:port\": tls: failed to verify certificate: x509: certificate signed by unknown authority"
ts=2023-12-07T10:24:46.624187979Z caller=handler.go:120 module=http_2xx target=https://app.namespace.svc:port level=info msg="Response timings for roundtrip" roundtrip=0 start=2023-12-07T10:24:46.618548821Z dnsDone=2023-12-07T10:24:46.618548821Z connectDone=2023-12-07T10:24:46.619955324Z gotConn=0001-01-01T00:00:00Z responseStart=0001-01-01T00:00:00Z tlsStart=2023-12-07T10:24:46.619998796Z tlsDone=2023-12-07T10:24:46.624134551Z end=0001-01-01T00:00:00Z
ts=2023-12-07T10:24:46.62420857Z caller=main.go:181 module=http_2xx target=https://app.namespace.svc:port level=error msg="Probe failed" duration_seconds=0.047321017
I faced a similar issue:
I solved it by:
/etc/pki/ca-trust/source/anchors/
directoryupdate-ca-trust
blackbox.yml
configuration file