ControlUp Deep Dive – Triggers & E-mail Alerts

This blog post will highlight ControlUp’s incident trigger and e-mail alerting feature. By exploring the ins and outs of the feature, along with its major use cases, we will demonstrate how to create a specific incident trigger using a typical use case. Covering the entire process from start to finish, we will also explain how an incident is detected, what happens after it is unearthed, and finally the resulting email alert configuration. With this information, you will be able to configure an advanced incident trigger, including email alerts, with your very own console.

ControlUp incident triggers let you know about important events in your network, whether you, specifically, indicate the events you want to be observed or ControlUp brings them to your attention. That way, whenever a specific incident is taking place, you can take proper action. Whenever a specific incident is detected, multiple follow-up actions can be carried out (i.e. email alerts, mobile push notifications to Android or IOS applications, and event logs).

Don’t use ControlUp? Learn more about it here.

ControlUp Incident Triggers – Primary Use Cases

Some common incident trigger use cases include:

  1. Detecting when critical Windows services are no longer available (i.e. have crashed or stopped) – ControlUp can monitor particular Windows services so the appropriate actions can be taken.
  2. Uncovering host, server, or endpoint performance issues (i.e. when RDS servers or VDI Endpoints exceed a stress level threshold) – ControlUp can detect when high resource usage causes increased levels of stress to be placed on systems or processes.
  3. Identifying application performance issues – ControlUp observes application stress levels and detects when an application or specific process (i.e. .exe file) is consuming too much memory, I/O, or CPU.
  4. Monitoring specific application usage – Exposing certain metrics for particular applications (i.e. how many times the application has been opened and by who).
  5. Catching specific Windows events – ControlUp triggers can detect when a specific event is created on any Windows-managed computer.

Creating an Advanced Application Performance Incident Trigger

Let’s begin our exploration into incident triggers with an application performance use case. One of our customers’ ERP applications started behaving erratically to the extent that when a user would begin performing an action, CPU usage would reach 30%. After a mere 30 seconds, this caused processes to jam up, prohibiting end users from using the application. As a result, the sysadmin wanted to be notified of this peculiar behavior as well as the accompanying information that could be saved for further analysis and troubleshooting.

ControlUp monitors specific processes and enables the detection of similar cases, including notifying sysadmins when issues occur. ControlUp triggers can be configured with advanced filters to monitor a specific process at hand with a defined threshold metric, such as CPU utilization.

For this specific use case, we will simulate an ERP.EXE application’s processes, in efforts to show exactly how to create an incident trigger with the valid parameters, as well as a follow up email alert.

1. Stress Level Settings

Before creating the trigger, we need to check our stress level and adapt it to our specific scenario.

The ControlUp Stress Level incident type applies to all record types in ControlUp (Folders, Hosts, Computers, Sessions, Processes, Executables and Accounts). With the stress level trigger, we will be able to identify performance issues, such as excessive CPU consumption.

In our example, below, we will configure the process’ CPU Stress Level.

Enter ControlUp Settings ⇒ Stress Settings ⇒ select the relevant folder (XenApp6.5 in our case) -> Processes ⇒ CPU Settings:

We need to optimize the stress level settings to match the scenario we want. For example, if you want to make the stress level critical when CPU usage is over 30% for a duration of 30 seconds or more, you can change the duration to 30 seconds and change the load to 6. These changes mean that whenever any process running on the XenApp farm reaches 30% CPU for more than 30 seconds, the process load will increase to the number 6, and without any connection to the other metrics, it will switch to the critical stress level. In order to prepare the proper stress level metrics, in this case, both the duration and the load factor had to be changed.

2. Create the Trigger

To create an incident trigger, click on the “Add Incident Trigger” button on the “ControlUp Management Console” window.

This step will provide you with all of your possible trigger options via the New Incident Trigger Wizard. You can then relate the type of incident trigger you need to your specific use case (i.e. stress level, windows event, computer down, process started…).

As shown below, if you are monitoring a specific application process, select the ‘Process’ record type. As for choosing a stress level, we are only interested in ‘Critical’ for the sake of our use case.

For this particular use case, we will be using a stress level incident trigger. Therefore we will select “Stress Level” and click Next.

Select Record Type: “Process”, Stress Level: “Critical”, and Duration: Leave Default (since the 30 seconds duration is already part of the Stress Level Settings), then click Next.

As seen above, the newly created incident trigger will detect any process that exceeds the critical stress level. In order to configure our own custom trigger related to this specific process we need to continue on to the next step and use the filter editor. As shown below, we set the filter name (ERP.EXE) and the specific CPU usage threshold (>=30%).

ERP.EXE only runs on XenApp 6.5 servers, so we will select the Scope to be our XenApp6.5folder and set the schedule for Weekdays (i.e. if you want the trigger to be detected between 9:00am-6:00pm during the workweek), then click Next.

If you don’t configure any follow-up actions, by default, whenever ControlUp detects that an ERP.EXE process in the XenApp 6.5 folder is about to cross the 30% CPU usage mark, it will be logged into the central incidents database. This allows you to later go to the incidents pane and view the details regarding that specific incident (i.e. how many times the incident happened, when, which users it happened with, etc…) in order to do a historical analysis.

If you also want to be alerted in real-time when the incident happens, you have to select ‘Send an email alert’ under the type of follow-up actions. For alert settings, select ‘Send an e-mail alert’ and choose the relevant recipients, then click Next.

At the end, enter a trigger name and click Finish.

Now you can see the new trigger added below in the trigger list and click Ok.

3. Watching it in Action – Viewing Incidents and Email Alerts

Once a new incident occurs, the incident trigger will generate an email alert, as seen below:

As seen above, the details of the incident are recorded in ControlUp’s historical incident database. The link at the email to the problematic process (ERP.EXE) refers to ControlUp’s incidents pane, as shown below:

You can double click the highlighted ‘Process Stress’ line to drilldown and see specific incidents:

Once the email alert is received, the sysadmin should check ControlUp’s real time performance views, enter the XenApp6.5 folder and filter by “erp” to reveal the problematic process, as shown below:

In this case, in order to eliminate any additional performance issues, the sysadmin can simply right click and kill the process.

Check out our edoc to learn more about incident triggers.

Final Note

In this article, we saw how a ControlUp incident trigger can track advanced performance issues. With ControlUp incident triggers, sysadmins gain a great deal of flexibility with detailed monitoring capabilities that drive transparency and control over their environment’s performance.

Learn more about ControlUp