ControlUp Monitor Cluster Sizing

Over the last several years, ControlUp Monitor clusters have become among the most important elements of a highly scalable ControlUp architecture. While one or two monitor VMs might be good enough for most proof-of-concept (POC) installations, large enterprise customers with multiple physical locations and tens of thousands of Citrix Virtual Apps and Desktops (CVAD) or VMware Horizon user sessions rely on adequate monitor clusters with dozens of nodes. The sizing numbers in this article give you an idea of the underlying design principles.

Have you ever installed the ControlUp Real-Time Console in your CVAD or VMware Horizon environments? If your answer is yes, there’s a good chance that you’ve also installed or configured the ControlUp Monitor component. In most proof-of-concept installations, this is a straightforward process and doesn’t require a lot of planning. But there is more. 

In the first part of this article, we’ll go over the exact purpose of each Monitor role. The second part will highlight the sizing of Monitor clusters for maximum scalability in production environments.

 

Monitor Fundamentals

From a technical perspective, ControlUp Monitor and the ControlUp Real-Time Console share the same core features, except for the interactive user interface. Once started, both sign into a ControlUp organization using a sufficiently privileged Active Directory account, connect to managed data sources, and retrieve telemetry data at frequent intervals from these data sources. Data sources can be ControlUp Agents or third-party components with a publicly exposed programming interface. 

Unlike the ControlUp Real-Time Console with its interactive user interface, the “headless” Monitor is implemented as a Windows service and runs in the background without user interaction. It continuously aggregates and processes collected data sets in memory before forwarding them to visualization and historical analysis components, such as ControlUp Insights or Solve. In addition, a Monitor can be configured to alert ControlUp users about incidents, to export data sets to a disk, to store shared credentials for logging into third party data sources, and to execute automated actions. In essence, a ControlUp Monitor is like a digital heart pumping telemetry data and remediation commands.

NOTE: Sometimes, ControlUp customers or partners confuse the roles of the Monitor and the Agent. While there might be hundreds or thousands of ControlUp Agents deployed in an IT infrastructure, there is only a small number of Monitors. These receive performance updates from the Agents and may send remediation commands to selected Agents.

Even though it is possible to install the ControlUp Monitor component on almost any managed Windows machine in an organization’s IT infrastructure, we strongly recommend creating and using a dedicated Windows VM for each Monitor. If dimensioned properly, a single Monitor VM or “Monitor node” can handle thousands of data sources (e.g., 2,500 managed VMs with 160 processes per machine). The typical size of a Monitor VM is 2 to 8 vCPUs and 16 to 32 GB of memory. Once the Monitor service starts, the capacity of the Monitor is calculated according to the number of vCPUs and the amount of memory. Each data source that connects to the Monitor contributes an estimated virtual “weight” that eventually will fill the Monitor up to its capacity limit.

 

ControlUp Monitor Clusters

When infrastructures grow beyond the maximum capacity of a single Monitor node or when failover capabilities are required (N+1 configuration), additional Monitor nodes can be deployed. A cluster of Monitors is established automatically as soon as one or more Monitor nodes are added to an organization with an existing single Monitor node. The sizing of each Monitor cluster node is specified under: https://support.controlup.com/v1/docs/sizing-guidelines-for-controlup-v8-x

 

ControlUp Monitor Cluster

 

In a cluster of Monitors, one acts in the role of a Master Monitor. This Monitor role is responsible for dividing up all the organization’s monitoring tasks among the other Monitors in the cluster, according to each Monitor’s capacity. All the other Monitors in the cluster are subordinate to the Master Monitor. The Master Monitor’s built-in algorithm to distribute the load, based on each individual Monitor VM’s available vCPUs and memory allows for mixing different VM sizes (this is not recommended). You can find more details on deploying multiple monitors in an organization under: https://support.controlup.com/docs/introduction-to-controlup-monitor-clusters-in-v8

As a rule of thumb, each node in a cluster of Monitors can handle up to 2,500 VDI machines or 5,000 concurrent sessions running on RDSH-based CVAD or VMware Horizon servers. We always recommend adding one more Monitor node for redundancy reasons. A Monitor Cluster Sizing Calculator is available at https://calc.controlup.com.

 

ControlUp Monitor Cluster sizing calculator

 

It’s obvious that in organizations with many tens of thousands of data sources, more than ten Monitor cluster nodes are needed. This is the threshold where you need to add one or more dedicated Master or Management Monitor nodes that are not connected to a data source. Also, there is another sizing rule in large organizations: If more than ten automation actions are executed with a large number of ControlUp Agents as their targets, add one more Monitor node for every three existing nodes in your Monitor cluster. Constantly checking telemetry thresholds configured in triggers, executing automated actions, and sending remediation commands can be resource-intensive; this is why additional Monitor cluster capacity is required. Introducing dedicated “Scripting Monitor Nodes” is another way of addressing such a high load scenario.

The performance of a Monitor cluster can be further optimized in several ways:

 

  1. In large environments, ControlUp Data Collectors can be introduced to increase scalability when retrieving data from third party data sources, such as VMware vCenter, Citrix Delivery Controllers, Citrix XenServer Pool Masters, Nutanix AHV Clusters, and Citrix ADC (formerly NetScaler) appliances. A Data Collector acts as a third-party data aggregator and, from a Monitor’s perspective, it represents only a single data source.

    Learn more:: https://www.controlup.com/resources/blog/entry/what-is-a-data-collector-and-why-does-it-matter/.
  2. By default, all folders, hypervisors, and computers are monitored. To exclude any data source from the Monitor cluster, right click it in the organization tree view in the Console, select properties and then set “Exclude from ControlUp Monitor” to Yes.

 

Exclude a group of data sources from ControlUp Monitor
Exclude a group of data sources from ControlUp Monitor

 

Monitors work best when they are at the same location as the data sources they are monitoring, as it minimizes latency in the collection of data from those sources. In organizations with multiple locations, a Site can be created for each distinct physical location. The Site should be configured to include all the Monitors and all the data sources they monitor in that location. The Master Monitor will only task Monitors in a given site with the job of collecting data from the data sources in that site. With multiple Sites, it may be necessary to deploy additional Monitor nodes for site-specific scalability and redundancy reasons.

EXAMPLE FROM THE FIELD: A large, international enterprise customer with 55,000 concurrent VMware Horizon user sessions has 44 ControlUp Monitor nodes, including several dedicated Master Monitors and several Scripting Monitors. Monitor cluster sizing across multiple Sites is based on high redundancy requirements and a maximum capacity of 3,500 data sources per Monitor node due to scheduled or advanced triggers sending remediation commands to a large number of targets.

ControlUp Monitor clusters have, over the past few years, become among the most important elements of a scalable ControlUp architecture. While one or two Monitor VMs might be sufficient for most POC installations, large enterprise customers with multiple physical locations and 10,000s of CVAD or VMware Horizon user sessions rely on adequate Monitor clusters with dozens of nodes. The sizing numbers in this article give you an idea of the underlying design principles. 

For details and more sophisticated design guidelines, please contact ControlUp Professional Services.