Does SCOM make sense as an application monitoring tool (used by development group)?

3.1k views Asked by At

The environment I'm in supports approximately 100 database, web, and misc. other servers (99% Windows). The infrastructure (hardware + network) is managed by consultants, while my group handles system development. The infrastructure folks have installed a SCOM environment that is used to keep track of general system health and so forth, and I'm interested in possibly using that to keep track of application health as well.

Can anyone offer insight into whether SCOM is a good fit for monitoring custom developed applications with custom rules? No one on my team has experience working with SCOM, and I'm trying to weigh the benefits of learning SCOM vs just running some Powershell scripts at scheduled intervals that look for warning conditions.

Some things we would want to monitor:

  • Scan log files for symptoms that would indicate special conditions, like "hung" services
  • Invoke URLs and monitor response times
  • Invoke URLs to scan for error messages
  • Monitor database query activity, etc.

Having no experience with SCOM (and coming from a development viewpoint), do these kinds of tasks fit well into what SCOM does? Would it make sense to learn SCOM to implement this kind of monitoring vs using Powershell, bat or cygwin scripts, or a tool like Gibraltar?

1

There are 1 answers

0
Lee J On

Yeah, what you're asking is somewhat possible. I spent the best part of a day trying to figure this out and so I thought I'd post what I found for you here.

it is possible to set up monitoring for any log on any machine. As long as you can see that log in windows with the event log viewer (there are actually many more, but if you can see it here. Assume it can me monitored.

Put whatever you are monitoring into it's own group in SCOM and set up it's own management pack. You can even set up mail alerts with the criteria for a subscription matching 'raised by any instance in a specific group' if you want to be mailed about these alerts.

Here is an example if you wanted to monitor just the 'application' log on a remote server:

  1. Start the Operations Console as a member of the Operations Manager Authors or Administrators role.

  2. In the Operations console. click the Authoring button.

  3. In the navigation pane:

    1. Expand Authoring, and then expand Management Pack Objects.
    2. Right-click Rules, and then click Create a new rule... to start the Create Rule Wizard.
  4. On the Select a Rule Type page:

    1. Expand Alert Generating Rules, expand Event Based, and then click NT Event Log (Alert).
    2. Select the destination management from the list (Windows Core Library - Customizations) or click New... to create a management pack.
    3. Click Next.
  5. On the Rule Name and Description page:

    1. In the Rule name box, type Application Event Log Error.
    2. Optionally, type a description for the rule.
    3. Click Select to select the item to target.
    4. In the Select Items to Target dialog, select Windows Computer, and then click OK.
    5. Ensure the Rule is enabled option is checked and then click Next.
  6. On the Event Log Name page, ensure Log name is set to Application, and then click Next.

  7. On the Build Event Expression page:

    1. Specify the following expression:

      Parameter Name Operator Value Event Level Equals Error

    2. Click Next.
  8. On the Configure Alerts page:

    1. In the Alert description box, specify the following:

Source: $Data/EventSourceName$ Event ID: $Data/EventDisplayNumber$ Event Category: $Data/EventCategory$ User: $Data/UserName$ Computer: $Data/LoggingComputer$ Event Description: $Data/EventDescription$

2.  In the Severity option, click Warning.
3.  Click Alert suppression... to define the handling of duplicate alerts. In the 

Alert Suppression dialog: 1. Click the following fields: Event ID Event Source Logging Computer Event Category User Description 2. Click OK.

  1. Click Create. Repeat the process to create a similar alert for errors in any other event log.

Might seem a little confusing (the poor formatting won't help, sorry) But once it's all there in front of you it'll make sense.

Hope this helps anyway mate,

Lee J