In my previous blog, I showed how to use ControlUp automation to help augment our troubleshooting ability by providing additional information.  When it detected the database had failed, ControlUp Automation automatically started several tasks; a CDF trace, packet capture, and database and connectivity tests.

Once you have collected enough traces for Citrix to analyze you are now stuck in a spot.  While we wait for the analysis and fix to come back, we need to find a way to remediate these outages.

Automatic Remediation

Using the Citrix Site Database going down as our primary issue, we know Citrix Virtual Apps and Desktops (nee XenApp/XenDesktop) can work around this issue by leveraging the Local Host Cache (LHC).  Citrix has a 90 second delay from when the database is detected as failed to when LHC fails over. Since the database outage can occur during peak logon times, we need to reduce the delay. In this particular case, enabling the “Local Host Cache” at first sign of an outage can bring the outage down from 90 seconds to just a few seconds.

With ControlUp Automation we can enable LHC immediately upon detection of the event database down event, and subsequently restoration of services when the database is detected as restored.

Script Action

I’ve created a script action to enable and disable the Local Host Cache.  This script accepts one parameter to force LHC into operation, and the absence of it sets the Citrix Broker to make its own determination of whether LHC should be used.  For the purposes of Automation, I enable the Local Host Cache. After it’s enabled by automation, manual intervention by an admin will be required to switch back to the Citrix Broker deterministic mode.  

This is to prevent flapping.  

Flapping is when something in the environment is going up and down. If automated restoration to the Citrix Broker deterministic mode was done, flapping would cause the alternating between enabling and disabling LHC Outage Mode.  Flapping could be a far worse state than leaving the LHC enabled until the problem has passed.

The process with this automation:

  1. Database outage occurs
  2. Local Host Cache is enabled
  3. Local Host Cache will remain enabled even if Database connectivity is restored.
  4. The administrator will have to decide if the problem has passed and manually set Citrix Broker as Primary, stopping use of the Local Host Cache.

The script is available here:

or the complete ControlUp Script Action is here.  Simply save as an XML and import into the Script Management window in ControlUp.

 

Trigger

Due to the criticality of the database outage and the required intervention by an Administrator, the trigger is going to have another action assigned.  I will look at the new ControlUp Email Templates and configure one for this Script Result. The ControlUp Email template allows us to customize a tailored alert.  Since this issue could cause a Major Incident, this email alert will contain emphasis and color coding to ensure the message is received. More information on email templates can be found here.

I’ll configure the template like so:

The text for the email template:

 

The result of the email template:

I setup the trigger as follows:

 

Now let’s watch automatic remediation in action.

Video of AA in action

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *