ControlUp Deep Dive – The Incidents Pane, Part 2

Two weeks ago we posted a blog about the Incidents pane basics.  You can check it out by clicking here.  In this follow-up blog we’ll dig even deeper into the Incidents pane.

Grouping Incidents by Columns

By using the ‘Group by columns’ option, you can gain additional insights based on the recorded incidents. One good example is the ‘Counter Name’ column, by grouping the Computer Stress Incidents based on the ‘Counter Name’ column (and removing the default ‘Computer’ column) you can quickly see which performance metrics are the major bottleneck in your environment. As seen in Figure D below, Virtual Disk Reads/Writes KBps are the major bottleneck in our lab environment.

Figure D

Creating Custom Incident Triggers

ControlUp allows for the creation of custom incident triggers. You can find instructions for creating these in a blog post called ControlUp Deep Dive – Triggers and E-Mail Alerts.

Real Life Use Cases

As previously explained the ControlUp Incidents pane can be used for a variety of purposes. Even so, there are some use cases that tend to be especially common. Let’s review the following real world examples.

DIAGNOSING EXCESSIVE CPU USAGE

High CPU usage is a common root cause for slow user experience. The Incidents Pane can help to quickly identify, from an historical perspective, which apps and services are the major CPU “hoggers” in your environment.

To begin the troubleshooting process drilldown to the Process Stress incidents and use the ‘Group by columns’ option to select the following columns:

  • Counter Name
  • Image Name

The Incidents pane will now list all processes that caused stress events in the last 14 days including the relevant counter name. We are interested (in this example) only in the CPU related records. As seen in Figure E below, the MsMpEng.exe (Microsoft Antimalware Service Executable) agent triggered 128 stress events in the last 14 days and WmiPrvSE.exe (the WMI Provider Host) triggered 107 stress events in the last 14 days.

Figure E

APPLICATION CRASH ANALYSIS

The Incidents pane is also useful for troubleshooting application crashes. Imagine that users have been complaining that their applications frequently crash. In this situation, you need to find out which applications are crashing and how often. Only then can you begin looking for the root cause of the problem. Fortunately, the Incidents pane can give you the information that you need.

By drilling down in to the Windows Events incidents and searching for “faulting application” we can see all historical application crashes in the last 14 days. As seen in Figure F below, we had 396 application crashes in the last 14 days.

Figure F

In order to see the actual application crashes details, simply drill down into the Event Id 1,000 line. As seen in Figure G below, crash details such as the application name and exception code are available for each crash. This list can also be exported to Excel for easier filtering and grouping.

Figure G

Conclusion

The ControlUp Incidents pane is useful for reporting incidents that have occurred on your managed systems. The information that is collected as a part of the logging process can also be useful for diagnostic purposes.