Troubleshooting Citrix PVS Issues with ControlUp

Citrix Provisioning Services (PVS) is a popular technology to manage XenApp and XenDesktop images in Citrix environments. PVS enables multiple desktops or servers to boot from a single VHD based image thus simplifying image management and eliminating the need to update and patch individual systems.

In this blog post we will cover some common PVS issues and challenges Citrix admins might encounter, provide some background on each and explain how ControlUp can help tackle these challenges and simplify PVS management and troubleshooting.

Common Citrix PVS Issues and Challenges

Monitor target device write cache size

Each target device has a personal area called the write cache that includes all data written by the target device. PVS supports multiple locations for the write cache area but the most common types are cache on device hard drive or cache on device RAM with overflow on hard disk.

ControlUp Get PVS Write Cache Size script based action enables quick monitoring of the write cache size on multiple devices for both RAM and disk based cache types:

The output of the “Get PVS Write Cache Size” SBA on multiple XenApp/XenDesktop VM’s

The upcoming ControlUp 5.x release monitors the Non-paged Pool Memory size which enables automatic alerts in case the RAM cache size reach a certain threshold:

The new Non-paged Pool Memory computer column in ControlUp 5.0

Troubleshooting target device boot issues

The target device boot process is fairly complex and depends on multiple components to work properly. Citrix published an excellent diagram that illustrates the boot process (click on the image to download the full diagram):

In addition to the boot process diagram, the following presentation includes great info on how to troubleshoot target device boot issues – http://www.slideshare.net/davidmcg/troubleshooting-provisioning-services-target-boot-processes. Another useful blog on troubleshooting the image boot process in PVS 7.6 farms was recently published – http://www.b-critical.com/2015/09/monitoring-and-troubleshooting-pvs-7-6/

ControlUp can assist in troubleshooting the target device boot process by providing a holistic view of the different components (e.g. PVS servers, DHCP, Domain Controllers…) involved and aggregating all troubleshooting data a single location. Specifically, the Events Pane can be used to view all PVS related events in a single place and the File System Controller enables gathering multiple PVS target device log files to a single location.

Monitor target device retries

Target device retry definition (from http://support.citrix.com/article/ctx117491) – The client’s driver performs a vDisk I/O by sending a request to the Provisioning Server. If a transaction fails because of a timeout (which is a no-reply timeout), the driver tries to receive the packet again. The retry number is accumulated by the client and reported to the client’s statistics tray.

Often, excessive target device retires indicates a performance issue that needs to be investigated. ControlUp Get PVS Target Device Retries script-based action enable quick monitoring of the retries count on multiple computers and finding out if the issue exist in the customer environment:

The output of the “Get PVS Target Device Retries” SBA on multiple XenApp VM’s

Computer AD account password sync issue

Issues with the streamed computer account password might cause the PVS target device to boot with the dreaded “The trust relationship between this workstation and the primary domain failed” message. Citrix has a pretty good KB article on troubleshooting this issue. Often the solution involves resetting the computer account password via the PVS console.

ControlUp includes a script-based action called “Citrix PVS Reset Machine Account Password” that automates the process of resetting the PVS target device AD password. With a single click ControlUp will reset the password and boot the target device:

The “PVS Reset Machine Account Password” SBA output

App crashes (the ASLR bug)

As described in CTX139627, application may crash randomly inside target devices that use the “Write Cache on Target Hard Disk” option. A lot of times, the applications will crash with the 0xc0000005 exception code that can be seen in the event viewer.

The ControlUp Incidents Pane helps customers to quickly find out if they suffer from the PVS ASLR bug. By aggregating all application error events to a single location, the customer can search for event id 1000 (or the 0xc0000005 exception) and find correlation between multiple app crashes and streamed PVS XenApp or XenDesktop VM’s. The same feature can also be used to verify the issue is fixed after migrating to the cache in RAM option in PVS 7.1 or later.

Searching the Incidents Pane for events containing the “0xc0000005” exception code

Summary

Citrix PVS is an excellent solution for simplifying image management in XenApp and XenDesktop environments, however due to its dependency on multiple infrastructure components troubleshooting processes can be long and difficult.

As we have shown in this blog, ControlUp real-time views, management scripts and event/log aggregation features help simplify  Citrix PVS management and issue troubleshooting. Please use the comments section to list additional Citrix PVS challenges you see in the field or comment about the items mentioned in this blog post.