We have an error alarm over API calls in Cloudwatch which tracks the sum ERROR
, FAILURE
and FAULT
metrics for the API. Each of the metric statistic is SUM and the overall metric on which alarm is created is a SUM of all three metrics. The Period is 5 mins and the Evaluation period is 15 mins. And the threshold is 1 for 3 data points in 15 mins. The problem is I don’t understand why the alarm is going into Alarm state even though for the period of 15 mins 3 datapoints did not breach the threshold. Can anyone help me understand?
This documentation here says the alarm is evaluated every minute and even then the condition of the alarm is not satisfied - https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#alarm-evaluation
CDK code for creation of alarm -
const apiMetrics = ['Error', 'Failure', 'Fault'];
const alarmSuffix = `${account.stage}.${account.region}`
const usingMetrics: { [key: string]: Metric } = {};
apiMetrics.forEach(apiMetric => {
usingMetrics[apiMetric.toLowerCase()] = new Metric({
metricName: apiMetric,
namespace: 'MyService',
statistic: Statistic.SUM, // Sum of errors, failures
label: `FetchMetadata_${apiMetric}`,
period: Duration.minutes(5),
dimensionsMap: {
Operation: 'FetchMetadata',
Program: 'MyService',
Service: 'MyService'
}
})
})
const apiErrorMetric = new MathExpression({
label: `Errors in MYService API FetchMetadata`,
expression: `SUM([${Object.keys(usingMetrics).join(',').toLowerCase()}])`,
usingMetrics
})
// Alarm for API failures
const apiErrorAlarm = new Alarm(scope, `FetchMetadata-Errors-${alarmSuffix}`, {
alarmName: `FetchMetadata-Errors-${alarmSuffix}`,
alarmDescription: `There are errors in MyService API FetchMetadata`,
metric: apiErrorMetric,
comparisonOperator: ComparisonOperator.GREATER_THAN_OR_EQUAL_TO_THRESHOLD,
threshold: 1,
treatMissingData: TreatMissingData.NOT_BREACHING,
evaluationPeriods: 3
});