The environment I'm in supports approximately 100 database, web, and misc. other servers (99% Windows). The infrastructure (hardware + network) is managed by consultants, while my group handles system development. The infrastructure folks have installed a SCOM environment that is used to keep track of general system health and so forth, and I'm interested in possibly using that to keep track of application health as well.
Can anyone offer insight into whether SCOM is a good fit for monitoring custom developed applications with custom rules? No one on my team has experience working with SCOM, and I'm trying to weigh the benefits of learning SCOM vs just running some Powershell scripts at scheduled intervals that look for warning conditions.
Some things we would want to monitor:
- Scan log files for symptoms that would indicate special conditions, like "hung" services
- Invoke URLs and monitor response times
- Invoke URLs to scan for error messages
- Monitor database query activity, etc.
Having no experience with SCOM (and coming from a development viewpoint), do these kinds of tasks fit well into what SCOM does? Would it make sense to learn SCOM to implement this kind of monitoring vs using Powershell, bat or cygwin scripts, or a tool like Gibraltar?
Yeah, what you're asking is somewhat possible. I spent the best part of a day trying to figure this out and so I thought I'd post what I found for you here.
it is possible to set up monitoring for any log on any machine. As long as you can see that log in windows with the event log viewer (there are actually many more, but if you can see it here. Assume it can me monitored.
Put whatever you are monitoring into it's own group in SCOM and set up it's own management pack. You can even set up mail alerts with the criteria for a subscription matching 'raised by any instance in a specific group' if you want to be mailed about these alerts.
Here is an example if you wanted to monitor just the 'application' log on a remote server:
Start the Operations Console as a member of the Operations Manager Authors or Administrators role.
In the Operations console. click the Authoring button.
In the navigation pane:
On the Select a Rule Type page:
On the Rule Name and Description page:
On the Event Log Name page, ensure Log name is set to Application, and then click Next.
On the Build Event Expression page:
Specify the following expression:
Parameter Name Operator Value Event Level Equals Error
On the Configure Alerts page:
Source: $Data/EventSourceName$ Event ID: $Data/EventDisplayNumber$ Event Category: $Data/EventCategory$ User: $Data/UserName$ Computer: $Data/LoggingComputer$ Event Description: $Data/EventDescription$
Alert Suppression dialog: 1. Click the following fields: Event ID Event Source Logging Computer Event Category User Description 2. Click OK.
Might seem a little confusing (the poor formatting won't help, sorry) But once it's all there in front of you it'll make sense.
Hope this helps anyway mate,
Lee J