Python prometheus_client not publishing metrics to Grafana Cloud

568 views Asked by At

I am a noob to grafana cloud/prometheus. I am working to publish metrics from a python prometheus_client(running in Google Compute instance) to grafana cloud. But I can't find the metrics I am publishing.

What am I missing here?

Answer edit: Turns out I needed the static_configs targets. And I needed to start_http_server in the main python script and create Summary in the script with metrics. The agent started publishing to cloud immediately. I don't understand why.

from prometheus_client import Summary, CollectorRegistry, start_http_server
from time import perf_counter
c_registry = CollectorRegistry()
api_hits_summary = Summary('resp_time','API calls', ['endpoint'], registry=c_registry)
start_http_server(8000)

...

st = perf_counter()
_, raw_resp = self._h.request(url)
api_hits_summary.labels(endpoint='S').observe(perf_counter()-st)

I ran curl localhost:8000 in the shell, I don't see resp_time_created logs lines

I think the agent metrics are available in the cloud datasource. prometheus_wal_watcher_current_segment and prometheus_tsdb_wal_segment_current metrics match exactly.

I can't see any errors except a few warnings.

May 07 11:49:53 instance-2 grafana-agent[17804]: ts=2023-05-07T15:49:53.111085331Z caller=wal.go:409 level=info agent=prometheus instance=<I removed instance-id> msg="series GC completed" duration=2.816839ms
May 07 11:49:53 instance-2 grafana-agent[17804]: ts=2023-05-07T15:49:53.112838837Z caller=checkpoint.go:100 level=info agent=prometheus instance=<I removed instance-id> msg="Creating checkpoint" from_segment=46 to_segment=49 mint=1683474270000
May 07 11:49:53 instance-2 grafana-agent[17804]: ts=2023-05-07T15:49:53.146233662Z caller=cleaner.go:203 level=warn agent=prometheus component=cleaner msg="unable to find segment mtime of WAL" name=/var/lib/grafana-agent/.cache err="unable to open WAL: open /var/lib/grafana-agent/.cache/wal: no such file or directory"
May 07 11:49:53 instance-2 grafana-agent[17804]: ts=2023-05-07T15:49:53.926131288Z caller=wal.go:474 level=info agent=prometheus instance=<I removed instance-id> msg="WAL checkpoint complete" first=46 last=49 duration=817.8648ms
May 07 12:19:53 instance-2 grafana-agent[17804]: ts=2023-05-07T16:19:53.248281806Z caller=cleaner.go:203 level=warn agent=prometheus component=cleaner msg="unable to find segment mtime of WAL" name=/var/lib/grafana-agent/.cache err="unable to open WAL: open /var/lib/grafana-agent/.cache/wal: no such file or directory"
May 07 12:49:53 instance-2 grafana-agent[17804]: ts=2023-05-07T16:49:53.036460234Z caller=cleaner.go:203 level=warn agent=prometheus component=cleaner msg="unable to find segment mtime of WAL" name=/var/lib/grafana-agent/.cache err="unable to open WAL: open /var/lib/grafana-agent/.cache/wal: no such file or directory"
May 07 12:49:54 instance-2 grafana-agent[17804]: ts=2023-05-07T16:49:54.094024836Z caller=wal.go:409 level=info agent=prometheus instance=<I removed instance-id> msg="series GC completed" duration=106.864899ms
May 07 12:49:54 instance-2 grafana-agent[17804]: ts=2023-05-07T16:49:54.21561055Z caller=checkpoint.go:100 level=info agent=prometheus instance=<I removed instance-id> msg="Creating checkpoint" from_segment=50 to_segment=51 mint=1683477870000
May 07 12:49:54 instance-2 grafana-agent[17804]: ts=2023-05-07T16:49:54.469656064Z caller=wal.go:474 level=info agent=prometheus instance=<I removed instance-id> msg="WAL checkpoint complete" first=50 last=51 duration=482.495692ms
May 07 13:19:53 instance-2 grafana-agent[17804]: ts=2023-05-07T17:19:53.151303299Z caller=cleaner.go:203 level=warn agent=prometheus component=cleaner msg="unable to find segment mtime of WAL" name=/var/lib/grafana-agent/.cache err="unable to open WAL: open /var/lib/grafana-agent/.cache/wal: no such file or directory"

I do see a WAL directory with some files.

instance-2:~$ sudo ls -l /var/lib/grafana-agent/<I removed instance-id>/wal/
total 864K
-rw-r--r-- 1 grafana-agent grafana-agent 288K May  7 11:49 00000052
-rw-r--r-- 1 grafana-agent grafana-agent 288K May  7 12:49 00000053
-rw-r--r-- 1 grafana-agent grafana-agent 271K May  7 13:47 00000054
drwxr-xr-x 2 grafana-agent grafana-agent 4.0K May  7 12:49 checkpoint.00000051

/etc/grafana-agent.yaml contents

server:
  log_level: info

metrics:
  global:
    scrape_interval: 1m
    remote_write:
      - url: https://prometheus-prod-<url>.grafana.net/api/prom/push
        basic_auth:
          username: <userid>
          password: <api key>
  wal_directory: '/var/lib/grafana-agent'
  configs:
    # Example Prometheus scrape configuration to scrape the agent itself for metrics.
    # This is not needed if the agent integration is enabled.
    # - name: agent
    #   host_filter: false
    #   scrape_configs:
    #     - job_name: agent
    #       static_configs:
    #         - targets: ['127.0.0.1:9090']

integrations:
  agent:
    enabled: true
  node_exporter:
    enabled: true
    include_exporter_metrics: true
    disable_collectors:
      - "mdadm"
1

There are 1 answers

2
Saxtheowl On BEST ANSWER

You dont have a scrape configuration file for the metric part in /etc/grafana-agent.yaml, it need to expose metrics on port 8000.

add this in /etc/grafana-agent.yaml and replace your_python_app_ip_address

configs:
  - name: python_app
    scrape_configs:
      - job_name: python_app
        static_configs:
          - targets: ['<your_python_app_ip_address>:8000']

then we restart sudo systemctl restart grafana-agent it should work and it should fix the warning too.