Blog

The IOPS Battle Continues: PVS vs MCS

Ever since Citrix released the new Provisioning Services cache in RAM with disk overflow there have been some spirited debates regarding the effectiveness of this new feature and it’s potential for the future. In fact, Citrix claims to have put storage vendors “on notice”.

Although Citrix has been heavily promoting the new feature, there haven’t been a lot of independent tests conducted as a way of verifying the effectiveness of the new cache in real world scenarios. A recent ESG Labs report claims that the RAM cache with disk overflow feature will reduce VDI and RDS related storage costs by up to 80%.

As an organization that works closely with Citrix technologies and enterprise customers, we decided to conduct our own independent testing as a way of determining the IOPS consumption rate when working with PVS and MCS on XenDesktop 7.6 VDI workloads.

Let’s Get Straight to the Point

Let’s begin with the fun stuff – the results! For this independent test we evaluated three distinct tiers – the hypervisor, the virtual machine, and the guest operating system.

Hypervisor Tier Test Results

When examining the hypervisor tier, we measured the number of IOPS that all virtual machines collectively produced within the underlying Hypervisor Datastore.  The following chart describes the IOPS consumption trend overtime.

IOPS consumption trend overtime
 Sum of R\W IOPS from all VMs at each point in time over the Login VSI load time

The above chart illustrates the Hypervisor Datastore IOPS consumption while running 3 similar Login VSI tests. Each uses a different XenDesktop provisioning technology – MCS, PVS with disk based caching, and PVS cache in RAM with disk overflow (PVS RAM cache from now on) set to 256MB RAM cache size.

The major spike in IOPS during the first part of the test can be attributed to the logon storm.
Login VSI is configured to perform one login every fifteen seconds until 20 logins have occurred. The entire process took 300 seconds to complete. The user profile configuration was based on the default Windows settings (local profiles, no folder redirection, UPM was not configured). The average user profile size was 5MB.

Although the spike in IOPS during the login storm is to be expected, the spike did not occur when PVS RAM cache was used. In fact, when using the PVS RAM cache option the demand for IOPS remained near zero throughout the logon storm and elevated only slightly as the test progressed. In contrast, MCS consumed around 100 IOPS per VM at its peak. In comparison PVS with Disk consumed just over half as many IOPS as MCS, at around 55 IOPS per VM.

Virtual Machine Tier Test Results

When evaluating the virtual machine tier, we noted the number of IOPS that each virtual machine made against the host Datastore and then based our findings on the mean value.

average IOPS consumption of single VM
Average IOPS per VM for every provisioning method

The chart above describes the average IOPS consumption of a single VM based on the vCenter Datastore I/O metrics. Both PVS caching methods utilize low read IOPS due to the fact that PVS reads the vDisk blocks over the network. On the other hand, MCS reads the base image blocks via the storage subsystem hence the higher level of read IOPS.

Both MCS and PVS disk methods write directly to the disk subsystem and produce relatively high levels of IOPS. In contrast, the PVS RAM cache method writes directly to RAM until the overflow to disk mechanism kicks-in.

When we examine the 95th percentile (shown in the chart below), we can see that MCS continues to yield a high level of IOPS, which is to be expected. PVS with RAM based cache was shown to produce almost no write IOPS, and a very low level of read IOPS. PVS with disk based caching produced a level of write IOPS that was identical to MCS, but had a slightly lower level of read IOPS than PVS with RAM based caching.

95th Percentile of Read & Write Requests per VM
Representative of the top 5% IOPS for every provisioning method

Guest Operating System Tier Test Results

Our operating system tier results were based on the level of IOPS that the guest operating system was able to perform. We wanted to make sure that the underlying infrastructure was able to deliver a sufficient level of IOPS to keep pace with the demands of the guest operating system. Therefore, higher recorded levels of IOPS reflect better performance.

Consumption of disk read and write operations
Consumption of disk read and write operations per second from the OS perspective

We based our evaluation of the guest operating system around the Windows Performance Monitor’s Disk Writes\Sec and Disk Reads\Sec counters. In the above results, we can see that PVS Disk Cache achieved roughly about three times the level of write IOPS than PVS RAM cache or MCS, both of which had very similar levels of write IOPS.

There is an interesting explanation for this behavior. The new PVS RAM cache architecture uses a 2 MB block size compared to the 4KB block size used in previous methods. This means that PVS RAM cache can achieve the same write throughput with significantly fewer I\O operations.

It’s worth mentioning that the Login VSI test did not stretch the PVS RAM cache IOPS limits.

In order to test the PVS RAM cache IOPS limits and compare it to MCS, we ran IOmeter on a single PVS and MCS VM. We based the IOmeter configuration on Jim Moyle’s VDI IO Profile in attempt to achieve results that were as similar as possible to a real VDI workload.

The following ControlUp screenshot shows the IO usage results on both VM’s while IOmeter was running:

IO usage results

We can see that the PVS RAM cache VM (W7-IOPS-PVS02) can utilize 28,000 IOPS without affecting the ESXi Datastore while the MCS VM (W7-IOPS-MCS02) can utilize only 1,600 IOPS, which is actually the NetApp storage limit in our lab environment. It’s worth mentioning that the IOmeter test file size was smaller than the configured RAM cache size. (Thanks Jim for catching this and giving your feedback.)

Login VSI Results

Our testing concluded that PVS RAM cache consistently yielded a lower level of IOPS than PVS with disk based caching or MCS. As expected, the highest level of IOPS was attributed to MCS, with PVS with disk based caching producing a level of IOPS that was midway between the other two methods.

Login VSI Results

Lab Setup

For our tests, we created 20, 64-bit Windows 7 SP1 virtual desktops running on top of VMware vSphere ESXi 5.5. The virtual desktops were stored on an NFS file share.

Hypervisor Hardware: Dell R620 with 2×6 Core CPU’s and 192GB RAM (during the tests only the XenDesktop VDI endpoints were running on this host)
Storage: NetApp FAS2040 DOT 8.1.2, 24 15K rpm disks
Additional configuration specifications listed in the table below:

Configuration Specifications PVS vs MCS

Our tests were based on Login VSI, which enabled us to automate logins into our virtual desktops. Performance metrics were collected by using ControlUp 4.1.

Testing Methodology

Our tests were based on a series of 30 minute segments during which the virtual user logged on and executed the Login VSI medium workload script. Login VSI was configured to perform a logon every fifteen seconds. As such, the first 5 minutes of each test produced a login storm (20 virtual desktops * 15 seconds = 300 seconds).

The virtual desktops were configured as pooled random, and each user session performed a first time logon. Furthermore, a reboot was performed prior to each segment in order to flush the Windows cache as a way of eliminating the possibility that the requested data had already been cached.

The PVS RAM cache size was set to 256 MB based on Amit Ben-Chanoch ‘PVS RAM cache overflow size matters’ blog. We repeated the tests while using 60 XenDesktop VM’s (instead of 20) and witnessed similar results.

Conclusions

The new PVS RAM cache mode is pretty much independent from the underlying physical Datastore (unless disk overflow kicks in) while delivering the fastest logon times. In comparison to other PVS alternatives, the new architecture gives the PVS RAM cache a small advantage even in Read IOPS.

With MCS every I/O operation occurs in the Hypervisor Datastore and the performance completely depends on the storage capabilities. Low and mid-range storage systems will provide inferior IOPS performance compared to PVS RAM cache.

PVS RAM cache shines a new dawn over the argument between Citrix experts regarding PVS vs MCS. By taking the best of both worlds (fast RAM for caching and stable disk for when overflow is needed and an improved driver architecture) PVS RAM cache offers a viable and affordable alternative for many VDI implementations.

Be Sociable, Share!

Comments (8)

  • Lorscheider Santiago | April 9, 2015 at 1:26 pm Reply

    Perfect post! Congratulations!
    Interesting to the MCS test with XenServer 6.5 enabled the Read Cache.

  • T. Kreidl | April 18, 2015 at 9:19 pm Reply

    I agree with Lorscheider. Comparing an old version of MCS to the shiny, new cache in RAM with overflow to disk on PVS has the expected outcome. The XS 6.5 extra cache available for dom0 for the Platinum XenDesktop and the Enterprise editions should help some. Bringing the cache in RAM with overflow to disk to MCS is highly desirable and should help level the playing field, and is hopefully at least under consideration.

  • Paul Stansel | May 21, 2015 at 10:25 am Reply

    While nice data to have from a storage perspective, this doesn’t at all address the management differences between the two as well as the infrastructure requirements of PVS. Not saying those are the end of the world but there are reasons for each beyond just IOPS.

    • Niron Koren | June 7, 2015 at 7:20 am

      Hi Paul,
      You are right, there are more concerns to take into account while choosing between the two,
      but we tried to keep the focus on this post specifically on IOPS .

      Also, just so you know we plan to compare Citrix MCS vs VMware Linked Clones (Composer) next.

  • Will | December 1, 2015 at 2:10 pm Reply

    Thanks for the great write-up Niron!

  • Christian Hornhues | December 1, 2015 at 3:45 pm Reply

    Correct, ist all about IOPS,

    PVS is best for non persistent, like Xenapp or Pooled Desktop. Because ist a cheap shop there for Maximum Power..

    MCS is good for Persistent VDIs. Where changes must persist. The IOPS Bottleneck is here solved on Plattforms like Nutanix or ILIO.

    This test would have been intersting, because it is clear that RAM Performance is higher than any SAN (SSD/Disk).

    So you should compare PVS vs. MCS on Nutanix. This would have been the real Deal. And a much better comparison.

  • Stevie T | March 1, 2016 at 2:27 pm Reply

    Christian – I totally agree with you there. Would love to see PVS/MCS head to head on Hyper Converged Infrastructure.

  • fbifido | September 6, 2016 at 6:52 pm Reply

    It’s Sep. 2016.
    How about a re-match with XenDesktop 7.9 & updated tech?

    Thanks.

Leave a comment

Your email address will not be published.

Do you have more
 questions?
Want a real person to walk you through a live demo?
We’re happy to help out.
Feb 5 – Citrix and ControlUp webinar introducing ControlUp 4.1 and user experience management. Sign up now!
START YOUR TRIAL
Get Your Download Link
Gain access to ControlUp from your PC. Register and get a link to start your Free Trial.