What is meant by symptom based monitoring and cause based monitoring?

744 views Asked by At

In SRE context, what is meant by symptom and cause-based monitoring? why it is so important? And which tools are used for these kinds of monitoring?

1

There are 1 answers

0
krishg On BEST ANSWER

Symptoms Versus Causes


Your monitoring system should address two questions: what’s broken, and why?

The "what’s broken" indicates the symptom; the "why" indicates a (possibly intermediate) cause. Table below lists some hypothetical symptoms and corresponding causes.

"What" versus "why" is one of the most important distinctions in writing good monitoring with maximum signal and minimum noise.

Example

+--------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+
|                        Symptom                         |                                                      Cause                                                      |
+--------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+
| I’m serving HTTP 500s or 404s                          | Database servers are refusing connections                                                                       |
|--------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|
| My responses are slow                                  | CPUs are overloaded by a bogosort, or an Ethernet cable is crimped under a rack, visible as partial packet loss |
|--------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|
| Users in Antarctica aren’t receiving animated cat GIFs | Your Content Distribution Network hates scientists and felines, and thus blacklisted some client IPs            |
|--------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|
| Private content is world-readable                      | A new software push caused ACLs to be forgotten and allowed all requests                                        |
+--------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+

Source

Tools used for monitoring depends on your platform, what and how you want to monitor. For example, Azure Monitor is for the applications and infrastructure hosted in Azure, Amazon CloudWatch for those in AWS, and so the list goes on.