Agnet mode Prometheus shard duplicate remote wirte error

123 views Asked by At

I am collecting error logs from prometheus.

My Config:
Distributed with helm,
targetRevision: 47.5.0
agentMode: true
prometheusSpec:
replicaExternalLabelNameClear: true
prometheusExternalLabelNameClear: true
replicas: 1
shards: 2

remoteWrite: [ MY MIMIR SERVER ]

Part of the config is as above, and you can see that the writes to each shard overlap, causing the following error. We suspect that this causes the memory usage of Prometheus Pod to increase rapidly. Values ​​such as ip, url, tenant, etc. were replaced with sample values ​​for security.

ts=2023-11-15T03:31:17.310Z caller=dedupe.go:112 component=remote level=error remote_name=a72022 url=https://MY-MIMIR-gateway.SAMPLE.com/api/v1/push msg="non-recoverable error" count=2000 exemplarCount=0 err="server returned HTTP status 400 Bad Request: failed pushing to ingester: user=smaple-tenant: the sample-tenant has been rejected because another sample with the same timestamp, but a different value, has already been ingested (err-mimir-sample-duplicate-timestamp). The affected sample has timestamp 2023-11-15T03:31:04.37Z and is from series {__name__=\"http_client_duration_milliseconds_count\", app=\"MY-SAMPLE_SERVER\", endpoint=\"metrics\", env=\"test\", http_method=\"POST\", http_status_code=\"200\", instance=\"99.99.99.99\", namespace=\"default\", net_peer_name=\"smaple.SAMPLE.com\", net_protocol_name=\"http\", net_protocol_version=\"1.1\", pod=\"MY-SAMPLE_SERVER-rollout-12345-12345\", service=\"MY-SAMPLE_SERVER-svc\"}"

Have you ever experienced a similar problem?
And if you know a solution, I would appreciate it if you could share it.

I would like to resolve errors caused by overlapping writing of each shard.

0

There are 0 answers