ControlUp Deep Dive – vSphere and XenServer Monitoring

ControlUp is a Real-time IT operations platform specifically designed to monitor and manage RDS and VDI environments. ControlUp 4.0 introduced support for vSphere and XenServer, thus enabling real-time performance monitoring and management of the virtualization infrastructure, including VMware ESXi and Citrix XenServer hosts. In this blog post we will explain how the Hypervisor Integration feature works, describe the recommended architecture for production environments, and present an advanced use case of troubleshooting an IOPS issue affecting the performance of a VDI farm.

ControlUp Hypervisor Integration Feature Architecture

How it works

ControlUp connects to the virtual infrastructure via built-in Web services running on VMware vCenter servers or XenServer Pool Master hosts. By running remote API calls against the virtualization infrastructure ControlUp can query real-time performance data from the hosts and execute management actions on the VMs without installing any agents on the vCenter server or the actual ESXi / XenServer hosts.

Hypervisor Data Collector Best Practices

The ControlUp data collector is the component responsible for connecting to the Hypervisor Web service and retrieving the real-time performance data via remote API calls. By default, each ControlUp console / Monitor service is configured as a data collector which pulls data directly from the connected hypervisors:

ControlUp default Hypervisor data collection mode

The above default data collection mode is suitable for POC’s and small environments. However, in Enterprise production environments this architecture might affect the performance of the vCenter server or XenServer host as every single instance of the ControlUp console and Monitor keep pulling real-time performance data independently. Therefore, in production environments we recommend configuring a dedicated data collector that will communicate with the hypervisors and act as a proxy for all other ControlUp consoles and Monitor services, as shown below:

ControlUp recommended Hypervisor data collection mode with dedicated data collector/s

In order to configure a dedicated data collector, simply edit the Hypervisor Connection and add a dedicated data collector under the Connection Options setting:

 

Hypervisor Connection settings screen

To enable high availability, it is recommended to configure at least two dedicated data collectors, the second data collector will start communicating with the Hypervisor infrastructure only in case the primary data collector fails.

You can leave the ‘ControlUp Console / Monitor’ object below the dedicated data collectors to enable continuous Hypervisor monitoring in case all dedicated data collectors fail.

Hypervisor Credentials

The data collector requires valid credentials in order to authenticate against the virtualization infrastructure. vCenter / XenServer Read-Only permissions are sufficient for all real-time monitoring purposes. In order to enable VM power management actions the configured credentials should include power management rights on the hypervisor. For consistency and security reasons, ControlUp supports a single set of credentials for each Hypervisor Connection. This means that the same credentials set needs to be configured on each ControlUp console / Monitor computer. For this reason, we recommend configuring a dedicated service account for the ControlUp Hypervisor Integration feature.

Customer Use Case / Hosts and VM’s Metrics

Let’s review the following customer use case to illustrate the potential of the Hypervisor Integration feature and review some of the Hosts / VM’s available performance metrics (a full list of Host and VM metrics are available on our edocs).

After receiving complaints from a couple of Horizon View end users regarding slow performance of their VDI endpoints, the View admin launched ControlUp real-time grid to evaluate the current state of the ESXi hosts running the Horizon View VDI endpoints:

ControlUp Hosts View

The View admin quickly identified a single Host (ESX02) is in a critical stress level due to a high IOPS consumption. Double-clicking the host entry in the data grid allows the admin to drill down and see which guest VMs are currently affected by the I/O bottleneck:

ControlUp Computers View

After the View admin identified the 4 VDI endpoints suffering from bad performance and high IOPS usage, he can now switch to the Processes view and identify the actual culprit:

ControlUp Processes View

By reviewing the Processes view the admin found that a process called Dynamo.exe running on the affected VM’s cause the high IOPS consumption. Looking on the process command line he found that all 4 processes are accessing a server called CURDSH01. The next step was getting a screenshot of the user session running on CURDSH01 to figure out which component is triggering the Dynamo.exe processes:

ControlUp ‘Get Session Screenshot’ action result

That’s it! By using ControlUp ‘Get Session Screenshot’ action the View admin found that a rogue sys-admin was running Iometer to stress test the Horizon View storage subsystem in the middle of the working day causing high IOPS usage and performance issues for multiple end users.

Final Note

In this blog post we described the ControlUp Hypervisor Integration feature architecture and some of the best practices for enterprise environments. We also presented a customer use case of troubleshooting an IOPS issue in a VMware Horizon View environment by using the different ControlUp views and actions.

For more information, please see this blog on ControlUp by Ivo Beerens, a VMware vExpert, who explains in detail how ControlUp can help a virtualization admin monitor and manage virtualization hosts and VDI environments.

ControlUp Product Team