The IOPS Battle Continues: PVS vs MCS

Ever since Citrix released the new Provisioning Services cache in RAM with disk overflow there have been some spirited debates regarding the effectiveness of this new feature and it’s potential for the future. In fact, Citrix claims to have put storage vendors “on notice”.

Although Citrix has been heavily promoting the new feature, there haven’t been a lot of independent tests conducted as a way of verifying the effectiveness of the new cache in real world scenarios. A recent ESG Labs report claims that the RAM cache with disk overflow feature will reduce VDI and RDS related storage costs by up to 80%.

As an organization that works closely with Citrix technologies and enterprise customers, we decided to conduct our own independent testing as a way of determining the IOPS consumption rate when working with PVS and MCS on XenDesktop 7.6 VDI workloads.

Let’s Get Straight to the Point

Let’s begin with the fun stuff – the results! For this independent test we evaluated three distinct tiers – the hypervisor, the virtual machine, and the guest operating system.

Hypervisor Tier Test Results

When examining the hypervisor tier, we measured the number of IOPS that all virtual machines collectively produced within the underlying Hypervisor Datastore.  The following chart describes the IOPS consumption trend overtime.

Sum of RW IOPS from all VMs at each point in time over the Login VSI load time

The above chart illustrates the Hypervisor Datastore IOPS consumption while running 3 similar Login VSI tests. Each uses a different XenDesktop provisioning technology – MCS, PVS with disk based caching, and PVS cache in RAM with disk overflow (PVS RAM cache from now on) set to 256MB RAM cache size.

The major spike in IOPS during the first part of the test can be attributed to the logon storm.
Login VSI is configured to perform one login every fifteen seconds until 20 logins have occurred. The entire process took 300 seconds to complete. The user profile configuration was based on the default Windows settings (local profiles, no folder redirection, UPM was not configured). The average user profile size was 5MB.

Although the spike in IOPS during the login storm is to be expected, the spike did not occur when PVS RAM cache was used. In fact, when using the PVS RAM cache option the demand for IOPS remained near zero throughout the logon storm and elevated only slightly as the test progressed. In contrast, MCS consumed around 100 IOPS per VM at its peak. In comparison PVS with Disk consumed just over half as many IOPS as MCS, at around 55 IOPS per VM.

Virtual Machine Tier Test Results

When evaluating the virtual machine tier, we noted the number of IOPS that each virtual machine made against the host Datastore and then based our findings on the mean value.

Average IOPS per VM for every provisioning method

The chart above describes the average IOPS consumption of a single VM based on the vCenter Datastore I/O metrics. Both PVS caching methods utilize low read IOPS due to the fact that PVS reads the vDisk blocks over the network. On the other hand, MCS reads the base image blocks via the storage subsystem hence the higher level of read IOPS.

Both MCS and PVS disk methods write directly to the disk subsystem and produce relatively high levels of IOPS. In contrast, the PVS RAM cache method writes directly to RAM until the overflow to disk mechanism kicks-in.

When we examine the 95th percentile (shown in the chart below), we can see that MCS continues to yield a high level of IOPS, which is to be expected. PVS with RAM based cache was shown to produce almost no write IOPS, and a very low level of read IOPS. PVS with disk based caching produced a level of write IOPS that was identical to MCS, but had a slightly lower level of read IOPS than PVS with RAM based caching.

Representative of the top 5% IOPS for every provisioning method

Guest Operating System Tier Test Results

Our operating system tier results were based on the level of IOPS that the guest operating system was able to perform. We wanted to make sure that the underlying infrastructure was able to deliver a sufficient level of IOPS to keep pace with the demands of the guest operating system. Therefore, higher recorded levels of IOPS reflect better performance.

Consumption of disk read and write operations per second from the OS perspective< We based our evaluation of the guest operating system around the Windows Performance Monitor’s Disk WritesSec and Disk ReadsSec counters. In the above results, we can see that PVS Disk Cache achieved roughly about three times the level of write IOPS than PVS RAM cache or MCS, both of which had very similar levels of write IOPS. There is an interesting explanation for this behavior. The new PVS RAM cache architecture uses a 2 MB block size compared to the 4KB block size used in previous methods. This means that PVS RAM cache can achieve the same write throughput with significantly fewer IO operations.

It’s worth mentioning that the Login VSI test did not stretch the PVS RAM cache IOPS limits.

In order to test the PVS RAM cache IOPS limits and compare it to MCS, we ran IOmeter on a single PVS and MCS VM. We based the IOmeter configuration on Jim Moyle’s VDI IO Profile in attempt to achieve results that were as similar as possible to a real VDI workload.

The following ControlUp screenshot shows the IO usage results on both VM’s while IOmeter was running:

We can see that the PVS RAM cache VM (W7-IOPS-PVS02) can utilize 28,000 IOPS without affecting the ESXi Datastore while the MCS VM (W7-IOPS-MCS02) can utilize only 1,600 IOPS, which is actually the NetApp storage limit in our lab environment. It’s worth mentioning that the IOmeter test file size was smaller than the configured RAM cache size. (Thanks Jim for catching this and giving your feedback.)

Login VSI Results

Our testing concluded that PVS RAM cache consistently yielded a lower level of IOPS than PVS with disk based caching or MCS. As expected, the highest level of IOPS was attributed to MCS, with PVS with disk based caching producing a level of IOPS that was midway between the other two methods.

Lab Setup

For our tests, we created 20, 64-bit Windows 7 SP1 virtual desktops running on top of VMware vSphere ESXi 5.5. The virtual desktops were stored on an NFS file share.

Hypervisor Hardware: Dell R620 with 2×6 Core CPU’s and 192GB RAM (during the tests only the XenDesktop VDI endpoints were running on this host)
Storage: NetApp FAS2040 DOT 8.1.2, 24 15K rpm disks
Additional configuration specifications listed in the table below:

Our tests were based on Login VSI, which enabled us to automate logins into our virtual desktops. Performance metrics were collected by using ControlUp 4.1.

Testing Methodology

Our tests were based on a series of 30 minute segments during which the virtual user logged on and executed the Login VSI medium workload script. Login VSI was configured to perform a logon every fifteen seconds. As such, the first 5 minutes of each test produced a login storm (20 virtual desktops * 15 seconds = 300 seconds).

The virtual desktops were configured as pooled random, and each user session performed a first time logon. Furthermore, a reboot was performed prior to each segment in order to flush the Windows cache as a way of eliminating the possibility that the requested data had already been cached.

The PVS RAM cache size was set to 256 MB based on Amit Ben-Chanoch ‘PVS RAM cache overflow size matters’ blog. We repeated the tests while using 60 XenDesktop VM’s (instead of 20) and witnessed similar results.

Conclusions

The new PVS RAM cache mode is pretty much independent from the underlying physical Datastore (unless disk overflow kicks in) while delivering the fastest logon times. In comparison to other PVS alternatives, the new architecture gives the PVS RAM cache a small advantage even in Read IOPS.

With MCS every I/O operation occurs in the Hypervisor Datastore and the performance completely depends on the storage capabilities. Low and mid-range storage systems will provide inferior IOPS performance compared to PVS RAM cache.

PVS RAM cache shines a new dawn over the argument between Citrix experts regarding PVS vs MCS. By taking the best of both worlds (fast RAM for caching and stable disk for when overflow is needed and an improved driver architecture) PVS RAM cache offers a viable and affordable alternative for many VDI implementations.