Here at ControlUp we periodically perform big data research on usage metrics in the wild. Our product lets systems administrators compare their IT metrics to the “community”, which is the aggregate of IT systems in organizations being managed by ControlUp. With the permission of our customers, we provide benchmarks on a variety of metrics, such as logon duration, application load time, protocol latency, and application usage. The data includes resource allocated to each virtual machine and what was actually used.
For this research
As far as we know there has never been a big data analysis done to answer these questions. So off we went.
What did we find?
The report presents extensive results, but here are some highlights.
We also sliced the data based on OS versions and had 3 key takeaways:
a) Newer OSes are indeed more efficient than its recent predecessors (values in blue in the chart below).
b) It appears people know that fact and
c) The ratio of over-provisioning (orange line) still remains crazy high.
Again, here are some server results, and the report has detailed statistics on both server and desktop OSes.
Why does this matter?
For memory it’s straight forward — the cost of over-provisioning is the cost. Allocating 1GB of RAM blocks off that memory from other VMs running on that hardware. Based on this concept, we found an average of $118 spent on memory over-provisioning per VM, which added up to $26.4m across our data set.
The costs of CPU over-provisioning is a bit more nuanced. While there is obviously a cost impact of buying more processors than needed, the actual dollar value is hard/impossible to measure because each physical core is shared across VMs. While CPU schedulers generally do a good job, there is a performance penalty here. The paper discusses this in detail but at a high level, allocating more vCPU than needed is analogous to asking a restaurant for a table for 8 when you have a party of 2 — you’ll wait longer for the table and other patrons will suffer. (Yes, NUMA alleviates this to an extent but…read the paper 🙂
What do I do about it?
Touching a system that “ain’t broke” can be a risky proposition. And instead of adding, removing resources? That’s crazy talk! In the paper we discuss some practical and prescriptive steps in navigating these waters.