Back to Posts

Proactive monitoring of event log using statistical measurement of activity

Windows Event Log is a valuable source of information, and it is usually my first stop, when diagnosing a server or desktop after a problem has occurred. What I would really like to have however is proactive monitoring capabilities – an Event Log that will draw my attention to a problem as it is occurring.

Can the event log be somehow used to help you understand the health of your monitored servers and desktops without having the exact specific events ID you suspect?

Existing monitoring tools are good at notifying you of specific events that happens if you already know the specific event ID/s.  However, these tools do not notify you if you can’t configure a specific event ID that you want to follow.

Considering the fact that most IT problems did not “deserve” the status of having their own KB item and “being known” to begin with, many events are unknown to the IT admin and thus cannot be notified using existing monitoring tools.  (For example, before KB970054 was published, most people didn’t know about this specific problem and were unfamiliar with event ID 333).

ControlUP  has a solution that can monitor irregular event log activity across your entire enterprise.  ControlUp’s special counters “Error Rate” and “Warning Rate” calculate the marginal rate of new errors per minute on each of your systems.  For example: if you get two new events in a period of one second then the marginal error rate per minute is 120.  These counters are not specific for a single Event ID, but rather are an aggregative measure off all the errors or warning level events in your environment.

Event log > ControlUP error rate indicator

ControlUp’s also has a special customized column called “Stress Level” which can serve as the health indicator of the resources you monitor. The Stress Level value is derived from other ControlUp counters that you configure. By configuring the “Error Rate” and “Warning Rate” counters to contribute to the server over all stress level you can get an indication when your serves or desktops are experiencing a surge of bad events.

High error rate > high stress computer

Adding error rate column

  1. Go to the Stress Settings pane.
  2. Select the computer view if not already selected.
  3. Select the error rate to expand its properties.
  4. Uncheck the “inherit default settings” checkbox, check the “Yellow “ set it to 2 and set the load to 2, check the “Red” set it to 3 and the load to 3.
  5. Repeat for warning rate and click the apply settings from the ribbon home tab.

 

  • Note that you might need to fine tune the values to your environment.

Stress settings pane – error rate settings

When receiving an alert from the Error rate counter you can utilize ControlUP’s events pane to see the event and drill-down further to investigate its details using handy shortcuts to popular sites like Google, EventID and Technet.

Events pane with search online

In conclusion, with ControlUp you can add proactive monitoring to your toolbox, letting the Event Log work for you using statistical counters on activity of the event log and adding another dimension to your system health measurement.

I would also like to emphasize the importance of keeping your event log free of errors and warnings. Don’t dismiss events as unimportant, since they often indicate major issues or conditions that develop into problems if left unattended.

Leave a Reply

Your email address will not be published. Required fields are marked *