Prometheus Expressions for CPU and Memory Usage Conditions

137 views Asked by At

I'm setting up Grafana alerts and need guidance on the conditions. I want two separate alerts to trigger if, over the last 10 minutes, the average CPU usage in any pod (across all namespaces) exceeds 90% of the respective pod's CPU limit, and similarly for Memory usage.

Can someone help with the expressions for these scenarios?

I tried this for Memory usage: avg_over_time(container_memory_usage_bytes[10m]) >= kube_pod_container_resource_limits{resource="memory"} * 0.9

This of course didn't work. I'm expecting it to return the pods that over the last 10 minutes, the average CPU/Memory usage exceeds 90% of the actual pod's CPU/Memory limit.

Update: I think I managed to build one of the queries I wanted but for a specific pod. Here is the query for Memory: avg_over_time(container_memory_usage_bytes{pod="nginx-f7d787f6c-t8x9s", container="nginx"}[10m]) > on(pod_uid) kube_pod_container_resource_limits{resource="memory", pod="nginx-f7d787f6c-t8x9s", container="nginx"} * 0.9

I need this query to run for the entire pods in the cluster and not just for a specific pod.

Example:

pod name Containers Memory usage / 10 minutes Memory limit
Pod1 5 9Mi 10Mi
Pod2 3 9Mi 1000Mi

Since the containers of Pod1 are using 90% of their Memory limit, I expect them to show in the query result.

0

There are 0 answers