NestJS OpenTelemtry - Failure to collect metrics using Telegraf

139 views Asked by At

I'm trying to auto-instrument my NestJS project using OpenTelemetry using the nestjs-otel package. I followed the instructions and made corrections as suggested by one of its opened issues.

This is my main configuration of the otelSdk:

export const otelSDK = new NodeSDK({
  metricReader: new PrometheusExporter({
    port: 8125,
  }),
  contextManager: new AsyncLocalStorageContextManager(),
  instrumentations: [
    new PinoInstrumentation(),
    new HttpInstrumentation(),
    new NestInstrumentation(),
    getNodeAutoInstrumentations(),
  ]
});

When running the service locally, I've managed to get the metrics up and running, so when visiting http://localhost:8125/metrics I see the metrics coming in:

...
# HELP http_server_duration Measures the duration of inbound HTTP requests.
# UNIT http_server_duration ms
# TYPE http_server_duration histogram
http_server_duration_count{http_scheme="http",http_method="GET",net_host_name="localhost",http_flavor="1.1",http_status_code="200",net_host_port="8125"} 3
http_server_duration_sum{http_scheme="http",http_method="GET",net_host_name="localhost",http_flavor="1.1",http_status_code="200",net_host_port="8125"} 933.854501
http_server_duration_bucket{http_scheme="http",http_method="GET",net_host_name="localhost",http_flavor="1.1",http_status_code="200",net_host_port="8125",le="0"} 0
http_server_duration_bucket{http_scheme="http",http_method="GET",net_host_name="localhost",http_flavor="1.1",http_status_code="200",net_host_port="8125",le="5"} 0
http_server_duration_bucket{http_scheme="http",http_method="GET",net_host_name="localhost",http_flavor="1.1",http_status_code="200",net_host_port="8125",le="10"} 0
...

I'm deploying my service using Kubernetes and using telegraf-operator to inject a telegraf sidecar to collect my metrics. I've provided the following annotations on my deployment resource:

        telegraf.influxdata.com/class: influxdb
        telegraf.influxdata.com/inputs: |+
          [[inputs.prometheus]]
            urls = ["http://localhost:{{ .Values.deployment.metrics.port }}{{ .Values.deployment.metrics.route }}"]
            metric_version = 1

However, when running the service over Kubernetes, I'm getting the following error:

[inputs.prometheus] Error in plugin: error reading metrics for http://localhost:8125/metrics: reading text format failed: text format parsing error in line X: second HELP line for metric name "http_server_duration"

To my understanding, there's a mismatch between the metrics format and the telegraf input plugin exceptions. I'm not sure which plugin I should use, and if I need to make any configuration changes for this to work.

Your help will be appreciated.

1

There are 1 answers

0
Yonatan On

I found out that the issue was because the http_server_duration metric was sent twice. I had to remove the new HttpInstrumentation() and getNodeAutoInstrumentations() for the duplicate to be gone. Then, the issue was solved.