Rolling window function in Google Cloud monitoring

993 views Asked by At

I am setting up an uptime check in Google cloud monitoring. When I add the check I also create an alert policy. Furthermore I set my check to global which means it will check from 6 different locations and I set the check frequency.

The alerting policy that was created (in cloud console when I created uptime check) has a rolling window set to 20 minutes and Rolling window function set to "next older".

I am trying to understand this

  1. Rolling window 20 min: Does that mean I will get one alert every 20 min or what is affected by this 20 minute window?
  2. next older: Does this mean it somehow uses data from the previous 20 minute slot? I don't understand this one.

Finally under Configure trigger there is also

Retest window: Does it do any retest in that period or does it just hold off on alerting for that amount of time to confirm it did not resolve?

1

There are 1 answers

3
Sai Chandini Routhu On

This official doc provides two different examples that describe when the condition is met. The following content is taken from official docs

Rolling window:

When using a rolling time window (or real time window), the metrics are calculated or analyzed for a particular alignment period, for example if you consider an alignment period of five minutes, at 1:00 PM, the alignment period contains the samples received between 12:55 PM and 1:00 PM. At 1:01 PM, the alignment period slides one minute and contains the samples received between 12:56 PM and 1:01 PM.

image

The first graph illustrates a combined case with an alignment period set to 5 minutes and a duration window of 3 minutes. The example illustrates when the aligned value exceeds the threshold and when the condition is met.

Retest window:

Even though the aligned value is greater than the threshold at time start + 2 minutes, the condition doesn't trigger until the aligned value is greater than the threshold for three minutes. That event occurs at time start + 5 minutes.

image

Example:

This alerting policy contains one metric-threshold condition that specifies a five-minute duration window.

Opening an incident and sending email to your support team is possible only if the HTTP response latency is greater than two seconds, and if the latency continues to be above the criteria for five minutes.

1.Less than two seconds is HTTP latency

2.HTTP latency exceeds two seconds during the following three minutes.

3.The condition resets the duration window when the latency in the subsequent measurement is less than two seconds.

4.The condition is met when HTTP latency exceeds two seconds for the following five minutes.

Note:Set the duration window to be long enough to minimize false positives, but short enough to ensure that incidents are opened in a timely manner.

Next older: To retain only the most recent sample within an alignment period, use the next older aligner. This aligner is commonly used with uptime checks and is a good choice when you only care about the most recent value. This aligner is valid only for gauge metrics.