Collect performance metrics using Google Cloud Ops Agent and send to Google Cloud Monitoring

1.4k views Asked by At

I'm looking for a general way to collect performance metrics on several Linux VM instances (Azure, GCP, other) and monitor the metrics in GCP.

On an Ubuntu VM in Azure, I have installed Google Cloud Ops Agent, which uses fluentd (to collect logs) and OpenTelemetry (to collect performance metrics) behind the scenes.

I added overrides for the two services to set environment variables so that they pick up the service account JSON credentials file, as follows:

  • google-cloud-ops-agent-fluent-bit.service GOOGLE_SERVICE_CREDENTIALS
  • google-cloud-ops-agent-opentelemetry-collector.service GOOGLE_APPLICATION_CREDENTIALS

See this post for more details on authentication.

I could see log messages appearing in Google Cloud Logging, which must have been scraped and sent by google-cloud-ops-agent-fluent-bit.service. However, I couldn't find any performance metrics from google-cloud-ops-agent-opentelemetry-collector. Where should I expect to find these in GCP? I'm convinced that there is some additional configuration I need to get this working, but the documentation seems to be about getting Ops Agent running on GCP Compute Engine instances.

Update 1:

I can see that the service is running (sudo systemctl status google-cloud-ops-agent-opentelemetry-collector.service), but I now notice errors that I hadn't noticed before which might suggest why metrics are not making it to Google Cloud,

exporterhelper/queued_retry.go:215        Exporting failed. Will retry the request after interval.        {"kind": "exporter", "name": "googlecloud", "error": "rpc error: code = InvalidArgument desc = One or more TimeSeries could no
t be written: No matching retention policy was found for one or more points.: timeSeries[0]\nerror details: name = Unknown  desc = total_point_count:1  errors:{sta
tus:{code:9}  point_count:1}", "interval": "5.52330144s"}

I don't know where to find the logs for the service other than the excerpt printed by systemctl status.

The commandline for the service is /opt/google-cloud-ops-agent/subagents/opentelemetry-collector/otelopscol --config=/run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml. I took a look in the config file and see a few mentions of googlecloud as an exporter, e.g.

exporters:
  googlecloud:
    metric:
      prefix: ""
    user_agent: Google-Cloud-Ops-Agent-Metrics/2.11.0 (BuildDistro=focal;Platform=linux;ShortName=ubuntu;ShortVersion=20.04)

Update 2: Output of service status

● google-cloud-ops-agent-opentelemetry-collector.service - Google Cloud Ops Agent - Metrics Agent
     Loaded: loaded (/lib/systemd/system/google-cloud-ops-agent-opentelemetry-collector.service; static; vendor preset: enabled)
    Drop-In: /etc/systemd/system/google-cloud-ops-agent-opentelemetry-collector.service.d
             └─override.conf
     Active: active (running) since Tue 2022-03-15 06:36:44 UTC; 1 day 17h ago
    Process: 1053790 ExecStartPre=/opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_engine -service=otel -in /etc/google-cloud-ops-agent/config.yaml -logs ${LOGS_DIRECTORY} (code=exited, status=0/SUCCESS)
   Main PID: 1053796 (otelopscol)
      Tasks: 10 (limit: 19198)
     Memory: 381.2M
     CGroup: /system.slice/google-cloud-ops-agent-opentelemetry-collector.service
             └─1053796 /opt/google-cloud-ops-agent/subagents/opentelemetry-collector/otelopscol --config=/run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml

Mar 16 23:47:37 HOSTNAME otelopscol[1053796]: go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
Mar 16 23:47:37 HOSTNAME otelopscol[1053796]:         /root/go/pkg/mod/go.opentelemetry.io/[email protected]/exporter/exporterhelper/metrics.go:134
Mar 16 23:47:37 HOSTNAME otelopscol[1053796]: go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
Mar 16 23:47:37 HOSTNAME otelopscol[1053796]:         /root/go/pkg/mod/go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry_inmemory.go:105
Mar 16 23:47:37 HOSTNAME otelopscol[1053796]: go.opentelemetry.io/collector/exporter/exporterhelper/internal.consumerFunc.consume
Mar 16 23:47:37 HOSTNAME otelopscol[1053796]:         /root/go/pkg/mod/go.opentelemetry.io/[email protected]/exporter/exporterhelper/internal/bounded_memory_queue.go:99
Mar 16 23:47:37 HOSTNAME otelopscol[1053796]: go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func2
Mar 16 23:47:37 HOSTNAME otelopscol[1053796]:         /root/go/pkg/mod/go.opentelemetry.io/[email protected]/exporter/exporterhelper/internal/bounded_memory_queue.go:78
Mar 16 23:47:37 HOSTNAME otelopscol[1053796]: 2022-03-16T23:47:37.980Z        info        exporterhelper/queued_retry.go:215        Exporting failed. Will retry the request after interval.        {"kind": "exporter", "name": "googlecloud", "error": "[rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: No matching retention policy was found for one or more points.: timeSeries[0-199]\nerror details: name = Unknown  desc = total_point_count:200  errors:{status:{code:9}  point_count:200}; rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: No matching retention policy was found for one or more points.: timeSeries[0-199]\nerror details: name = Unknown  desc = total_point_count:200  errors:{status:{code:9}  point_count:200}; rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: No matching retention policy was found for one or more points.: timeSeries[0-199]\nerror details: name = Unknown  desc = total_point_count:200  errors:{status:{code:9}  point_count:200}; rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: No matching retention policy was found for one or more points.: timeSeries[0-111]\nerror details: name = Unknown  desc = total_point_count:112  errors:{status:{code:9}  point_count:112}]", "interval": "10.435795045s"}
Mar 16 23:47:49 HOSTNAME otelopscol[1053796]: 2022-03-16T23:47:49.299Z        info        exporterhelper/queued_retry.go:215        Exporting failed. Will retry the request after interval.        {"kind": "exporter", "name": "googlecloud", "error": "rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: No matching retention policy was found for one or more points.: timeSeries[0-4]\nerror details: name = Unknown  desc = total_point_count:5  errors:{status:{code:9}  point_count:5}", "interval": "44.913550864s"}

0

There are 0 answers