Why can't prometheus blackbox exporter verify a tls endpoints self signed certificate? Details below

468 views Asked by At

We have an openshift cluster in which the prometheus operator monitoring stack is installed. We would like to probe the actuator/health endpoints of Spring Boot applications using blackbox exporter.

Here's what I've done so far:

Deployed blackbox exporter in the namespace we use for the prometheus operator. Service and ConfigMap is ready, a http_2xx module is defined in the configMap, the exporter is running. I have 2 namespaces (or projects) which have 1-1 application deployed in them, these are the same apps. I created a Probe in one namespace and a serviceMonitor in the other namespace. The probe uses a staticTarget config to probe the target, the serviceMonitor uses labels to do this dynamically.

My problem is that every probe attempt fails.

The serviceMonitor log says the following:

    `level=info msg="Invalid HTTP response status code, wanted 2xx" status_code=400`

I'm pretty sure this happens because these are https endpoints, but if I add a "scheme: https" line to the serviceMonitor config it just doesn't work.

The Probe says the following:

    `level=error msg="Error for HTTP request" err="Get \"https://appIP:port/actuator/health\": tls: failed      to verify certificate: x509: certificate signed by unknown authority"`

So far I only tried making the probe work, I have no clue what to do with the serviceMonitor.

I tried giving the probe a service ca to work with, did not work. I gave it the cert and key used by the app and it did not work, says the same.

Any idea what I should do? Configs below.

You'll notice Probe config does not have a ca object right now, but it gave the same log.

I'd really appreciate if someone could help me sort this out, it's driving me crazy :D

(note: tlsConfig: insecureSkipVerify: true does not skip the verification process, which is weird)

Blackbox exporter yaml:

data:
  blackbox.yaml: |
    modules:
      http_2xx:
        http:
          no_follow_redirects: true
          method: GET
          preferred_ip_protocol: ip4
          valid_http_versions:
          - HTTP/1.1
          - HTTP/2
          valid_status_codes: []
          tls_config:
            insecure_skip_verify: true
        prober: http
        timeout: 10s

serviceMonitor yaml:

spec:
  endpoints:
    - interval: 30s
      params:
        module:
          - http_2xx
      path: /probe
      relabelings:
        - action: replace
          sourceLabels:
            - __address__
          targetLabel: __param_target
        - action: replace
          replacement: 'exporter:port'
          targetLabel: __address__
        - action: replace
          sourceLabels:
            - __param_target
          targetLabel: instance
        - action: labelmap
          regex: __meta_kubernetes_service_label_(.+)
      scrapeTimeout: 10s
  jobLabel: jobLabel
  selector:
    matchLabels:
      app.kubernetes.io/component: component

Probe yaml:

spec:
  interval: 30s
  module: http_2xx
  prober:
    path: /probe
    url: 'exporter.namespace.svc:port'
  targets:
    staticConfig:
      static:
        - 'https://app.namespace.svc:port/actuator/health'
  tlsConfig:
    cert:
      secret:
        key: key
        name: secret-name
    keySecret:
      key: key
      name: secret-name

Manually invoking blackbox exporter says this:

Logs for the probe:
ts=2023-12-07T10:24:46.576847865Z caller=main.go:181 module=http_2xx target=https://app.namespace.svc:port level=info msg="Beginning probe" probe=http timeout_seconds=119.5
ts=2023-12-07T10:24:46.576945405Z caller=http.go:328 module=http_2xx target=https://app.namespace.svc:port level=info msg="Resolving target address" target=app.namespace.svc ip_protocol=ip4
ts=2023-12-07T10:24:46.615450737Z caller=http.go:328 module=http_2xx target=https://app.namespace.svc:port level=info msg="Resolved target address" target=app.namespace.svc ip=IP_of_service
ts=2023-12-07T10:24:46.615543908Z caller=client.go:252 module=http_2xx target=https://app.namespace.svc:port level=info msg="Making HTTP request" url=https://IPaddress:port host=app.namespace.svc:port
ts=2023-12-07T10:24:46.624148963Z caller=handler.go:120 module=http_2xx target=https://app.namespace.svc:port level=error msg="Error for HTTP request" err="Get \"https://IPaddress:port\": tls: failed to verify certificate: x509: certificate signed by unknown authority"
ts=2023-12-07T10:24:46.624187979Z caller=handler.go:120 module=http_2xx target=https://app.namespace.svc:port level=info msg="Response timings for roundtrip" roundtrip=0 start=2023-12-07T10:24:46.618548821Z dnsDone=2023-12-07T10:24:46.618548821Z connectDone=2023-12-07T10:24:46.619955324Z gotConn=0001-01-01T00:00:00Z responseStart=0001-01-01T00:00:00Z tlsStart=2023-12-07T10:24:46.619998796Z tlsDone=2023-12-07T10:24:46.624134551Z end=0001-01-01T00:00:00Z
ts=2023-12-07T10:24:46.62420857Z caller=main.go:181 module=http_2xx target=https://app.namespace.svc:port level=error msg="Probe failed" duration_seconds=0.047321017
1

There are 1 answers

0
user23328741 On

I faced a similar issue:

"Get "https://IPaddress:port": tls: failed to verify certificate: x509: certificate signed by unknown authority"

I solved it by:

  1. Adding the root and intermediate certificates that were missing to the /etc/pki/ca-trust/source/anchors/ directory
  2. Running update-ca-trust
  3. Adding the below configuration for the certificate authority file to the blackbox.yml configuration file
    https_2xx:
        prober: http
        timeout: 5s
        http:
          valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
          valid_status_codes: []  # Defaults to 2xx
          method: GET
          fail_if_ssl: false
          fail_if_not_ssl: true
          preferred_ip_protocol: "ip4" # defaults to "ip6"
          ip_protocol_fallback: false  # no fallback to "ip6"
          tls_config:
            ca_file: /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem