ControlUp Automation – Faster Than a Human – Augmented Troubleshooting

The challenges of troubleshooting intermittent problems  

the connection between the Citrix Broker Service and the database has been lost

I’m working with a customer who’s having an intermittent issue. The database that Citrix connects to occasionally becomes unavailable — why, no one knows. Since Citrix has a 90 second delay from the database outage to the local host cache (LHC) activating, this outage is really disruptive.  If it happens during the morning logon rush or a shift change, you could be preventing hundreds or thousands of end users from working.

Citrix has asked for some CDF traces. However, this problem might only last for a few minutes and resolve itself by the time the ticket comes. 

The ControlUp Automation Platform can assist with intermittent problems in two ways: augmented troubleshooting and automatic remediation. This post will cover augmented troubleshooting.

Augmented Troubleshooting

ControlUp can pick up on the events (like the one described above) as they are detected by the broker(s) and immediately start the information capture. Since this issue is intermittent, the capture will stop when the issue resolves itself.

Typically, the ask is to get multiple traces. This means letting the environment exist in the database down state until enough CDF trace files are captured for Citrix to analyze. 

Even though Citrix has asked only for CDF traces, we can do better. I’ve configured four additional scripts to provide us even more context for what the issue might be.

The first script is a continuous ping test to validate connectivity.
The second script leverages the Citrix Test-BrokerDBConnection cmdlet.
The third script executes a packet capture of network traffic.
The fourth executes a CDF trace.  

CDF Trace Setup

The first thing you need to do is set up our environment so you can execute CDF traces.

Download CDFControl and extract it somewhere in your system. For the purposes of this post, I’ve set it up in this path: C:SwinstCDFControl

Open CDFControl on your Citrix Desktop Delivery Controller (DDC). Under the “Trace Categories” menu, select “All Modules”.

CDFControl CDF Control

Under “Tools” select “Create CTL File”.

Create CTL File

Save the file in your CDFControl folder.

In my examples, I’ll be storing the CDFControl.exe here: “C:SwinstCDFControlCDFControl.exe”.

Copy the whole folder with CDFControl and your CTL file to all other Citrix Desktop Delivery Controllers.

You are now set up to gather additional information.

Script Actions

The first script will start a continuous ping against our database with timestamps.
The second script will use the Citrix cmdlet’s to execute a database health check.
The third script starts a network packet capture.
The fourth script will start CDF Trace.

Script 1 – Continuous Ping until the database responds

#requires -version 3
$ErrorActionPreference = 'Stop'
$ProgressPreference = 'silentlyContinue'
<#
    .SYNOPSIS
    Starts a continuous ping test against the Citrix database and logs the output.
 
    .DESCRIPTION
    Starts a continuous ping test against the Citrix database and logs the output if the Citrix Broker Service cannot contact the database.  Another event-based task will be created and linked to event
    ID 1200, Citrix Broker Service, Info to stop the ping.
 
    .NOTES
    Write-Output commands are captured by ControlUp and put into the ControlUp Auditing events.  So make them nice :)
    
#>
#region variables
#Change these paths to reflect your environment
$TraceOutput = "$($args[0])"

#Database restored event log values
[string]$sourceIdentifier = 'Citrix Broker Service'
[string]$eventLogName = 'Application'
[string]$eventProvider = 'Citrix Broker Service'
[int]$eventId = '1200'
[string]$eventName = 'EntryWritten'
#endregion variables

<#
Trigger on an event log entry being created
@guyrleech 2019
#>
$eventLog = Get-EventLog -List | Where-Object { $_.Log -eq $eventLogName }

try {
    $latestOutageEvent = (Get-EventLog -LogName Application -Source "Citrix Broker Service" -After (Get-Date -Format yyyy-MM-dd) | Where {$_.EventID -eq 1201})[0]
} catch {
    Write-Error "Unable to find the Event Id 1201 from source Citrix Broker Service for today."
}

if( ! $eventLog )
{
    Throw "Unable to find event log `"$eventLogName`""
}

$DBConnection = Get-ItemPropertyValue -Path HKLM:SOFTWARECitrixDesktopServerDataStoreConnectionsController -Name ConnectionString
$server = (($DBConnection -split ";")[0] -split "=")[1]
$1201TimeStamp = Get-Date $latestOutageEvent.TimeWritten -Format yyyy-MM-dd_HH-mm-ss
Unregister-Event -SourceIdentifier $sourceIdentifier -Force -ErrorAction SilentlyContinue
Register-ObjectEvent -InputObject $eventLog -EventName $eventName -SourceIdentifier $sourceIdentifier
Write-Verbose "Going into main loop @ $(Get-Date -Format G)"
try
{
    While( $true )
    {     
        $pingobj = Test-NetConnection -ComputerName $server
        $outputObj = New-Object PSObject
        $outputObj | Add-Member -MemberType NoteProperty -Name "Time" -Value (Get-Date -Format yyyy-MM-dd_HH-mm-ss)
        $outputObj | Add-Member -MemberType NoteProperty -Name "Server" -Value ($server)
        foreach ($property in ($pingobj | Get-Member | where {$_.MemberType -eq "Property"})) {
            Switch ($property.Name) {
                "RemoteAddress"  { $outputObj | Add-Member -MemberType NoteProperty -Name "$($property.Name)" -Value ($pingobj.$($property.Name).IPAddressToString)                           }
                "PingReplyDetails"          { $outputObj | Add-Member -MemberType NoteProperty -Name "PingReplyDetails (RTT)" -Value ($pingobj.$($property.Name).RoundTripTime)               }
                "InterfaceAlias"                  { $outputObj | Add-Member -MemberType NoteProperty -Name "$($property.Name)" -Value ($pingobj.$($property.Name))                            }
                "SourceAddress"                { $outputObj | Add-Member -MemberType NoteProperty -Name "$($property.Name)" -Value ($pingobj.$($property.Name).IpAddress)                     }
                "PingSucceeded"       { $outputObj | Add-Member -MemberType NoteProperty -Name "$($property.Name)" -Value ($pingobj.$($property.Name))                                        }
            }
        }
        $outputObj | Export-Csv -Path "$traceOutput$($1201TimeStamp)_PingTest.txt" -NoClobber -Append -NoTypeInformation
        $eventRaised = Wait-Event -Timeout 2 #this allows the ping loop to continue
        if( $eventRaised -and $eventRaised.SourceIdentifier -eq $sourceIdentifier )
        {
            $event = $eventRaised.SourceEventArgs.Entry
            if( $event )
            {
                if( $event.InstanceId -eq $eventId -and $event.Source -eq $eventProvider )
                {
                    Write-Host "Monitored event raised at $($eventRaised.TimeGenerated)"
                    Write-Host "Stopping CDF Trace."
                    Start-Process -FilePath $CDFControlEXEPath -ArgumentList @("-stop","-noprompt")
                    if ($data.IsCompleted -ne $true)  { $pscmd.Stop()  }
                    if ($data2.IsCompleted -ne $true) { $pscmd2.Stop() }
                    break
                }
            }
        }
        elseif( $eventRaised )
        {
            Write-Verbose "Caught unexpected event $eventRaised"
            Start-Sleep -Seconds 1
        }
        $eventRaised | Remove-Event
    }
}
catch
{
    Throw $_
}
finally
{
    Write-Verbose "Unregistering event source identifier $sourceIdentifier"
    Unregister-Event -SourceIdentifier $sourceIdentifier -Force
}

 

Script 2 – Test the Citrix Broker Database using the Citrix cmdlets

#requires -version 3
$ErrorActionPreference = 'Stop'

<#
    .SYNOPSIS
    Tests the Citrix Broker Database.
 
    .DESCRIPTION
    Starts a test of the Broker Database Connection if the Citrix Broker Service cannot contact the database. 
#>
#region variables
#Change these paths to reflect your environment
$TraceOutput = "$($args[0])"

$eventLog = Get-EventLog -List | Where-Object { $_.Log -eq $eventLogName }
try {
    $latestOutageEvent = (Get-EventLog -LogName Application -Source "Citrix Broker Service" -After (Get-Date -Format yyyy-MM-dd) | Where {$_.EventID -eq 1201})[0]
} catch {
    Write-Error "Unable to find the Event Id 1201 from source Citrix Broker Service for today."
}

#Get the timestamp for this last event.
$1201TimeStamp = Get-Date $latestOutageEvent.TimeWritten -Format yyyy-MM-dd_HH-mm-ss

ipmo Citrix.Broker.Commands
$DBConnection = Get-ItemPropertyValue -Path HKLM:SOFTWARECitrixDesktopServerDataStoreConnectionsController -Name ConnectionString
$testDBConnection = Test-BrokerDBConnection -DBConnection $DBConnection
Write-Output "Service Status`n--------------"  | Out-file "$TraceOutput$($1201TimeStamp)_TestDBConnection.txt"
$testDBConnection.ServiceStatus | Out-file "$TraceOutput$($1201TimeStamp)_TestDBConnection.txt" -Append
$keys = $testDBConnection.ExtraInfo.keys
foreach ($key in $keys) {
    Write-Output "`n" | Out-file "$TraceOutput$($1201TimeStamp)_TestDBConnection.txt" -Append
    $key | Out-File "$TraceOutput$($1201TimeStamp)_TestDBConnection.txt" -Append
    Write-Output "--------------"  | Out-file "$TraceOutput$($1201TimeStamp)_TestDBConnection.txt" -Append
    $testDBConnection.ExtraInfo.$key | Out-File "$TraceOutput$($1201TimeStamp)_TestDBConnection.txt" -Append
}

 

Script 3 – Capture network traffic

#requires -version 3
$ErrorActionPreference = 'Stop'
$ProgressPreference = 'silentlyContinue'
<#
    .SYNOPSIS
    Starts a network capture and stops when the database is restored.
 
    .DESCRIPTION
    Starts a network capture and stops when the Citrix Broker Service cannot contact the database.  Another event-based task will be created and linked to event
    ID 1200, Citrix Broker Service, Info to stop the capture.
 
    .NOTES
    Write-Output commands are captured by ControlUp and put into the ControlUp Auditing events.  So make them nice :)
    
#>
#region variables
#Change these paths to reflect your environment
$TraceOutput = "$($args[0])"

#Database restored event log values
[string]$sourceIdentifier = 'Citrix Broker Service'
[string]$eventLogName = 'Application'
[string]$eventProvider = 'Citrix Broker Service'
[int]$eventId = '1200'
[string]$eventName = 'EntryWritten'
#endregion variables

#region functions

#ConvertTo-PCAP taken from the great Malcolm McCaffery.  Code reused with his permissions.  Please see his blog posting here for more details: https://chentiangemalc.wordpress.com/2018/10/08/convert-netsh-trace-etl-to-pcap-with-powershell/
function ConvertTo-PCAP {
[CmdletBinding()]
param(
[Parameter(Position=0)]
[ValidateScript({
    if( -Not ($_ | Test-Path) ){
        throw "File or folder $_ does not exist"
    }
 
    if($_.Extension -ne ".etl"){
        throw "Source file must be .etl file"
    }
    return $true
})]
[System.IO.FileInfo]$Path,
 
[Parameter(Position=1)]
[ValidateScript({
    if( -Not ($path.DirectoryName | Test-Path) ){
        throw "File or folder does not exist"
    }
 
    if($_.Extension -ne ".pcap") {
        throw "Destination file must be .pcap file"
    }
    return $true
})]
[System.IO.FileInfo]$Destination,
 
[Parameter(Position=2)]
[Uint32]$MaxPacketSizeBytes = 65536)
 
 
$csharp_code = @'
using System;
using System.Collections.Generic;
using System.Diagnostics.Eventing.Reader;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
 
namespace chentiangemalc
{
    public static class NetworkRoutines
    {
    public static long ConvertEtlToPcap(string source, string destination, UInt32 maxPacketSize)
        {
            int result = 0;
            using (BinaryWriter writer = new BinaryWriter(File.Open(destination, FileMode.Create)))
            {
 
                UInt32 magic_number = 0xa1b2c3d4;
                UInt16 version_major = 2;
                UInt16 version_minor = 4;
                Int32 thiszone = 0;
                UInt32 sigfigs = 0;
                UInt32 snaplen = maxPacketSize;
                UInt32 network = 1; // LINKTYPE_ETHERNET
 
                writer.Write(magic_number);
                writer.Write(version_major);
                writer.Write(version_minor);
                writer.Write(thiszone);
                writer.Write(sigfigs);
                writer.Write(snaplen);
                writer.Write(network);
 
                long c = 0;
                long t = 0;
                using (var reader = new EventLogReader(source, PathType.FilePath))
                {
                    EventRecord record;
                    while ((record = reader.ReadEvent()) != null)
                    {
                       // Console.WriteLine(String.Format("record: {0}",record.ToXml()));
                        c++;
                        t++;
                        if (c == 10000)
                        {
                            Console.WriteLine(String.Format("Processed {0} events with {1} packets processed",t,result));
                            c = 0;
                        }
                        using (record)
                        {
                            if (record.ProviderName == "Microsoft-Windows-NDIS-PacketCapture")
                            {

                                if (record.Properties.Count() >= 4)
                                {
                                    if (record.Properties[2].Value.GetType() == typeof(UInt32))
                                    {
                                        //Console.WriteLine(String.Format("record.properties: {0}",record.Properties.Count()));
                                
                                        result++;
                                        DateTime timeCreated = (DateTime)record.TimeCreated;
                                        //Console.WriteLine(String.Format("record.TimeCreated: {0}",record.TimeCreated));
                                        UInt32 ts_sec = (UInt32)((timeCreated.Subtract(new DateTime(1970, 1, 1))).TotalSeconds);
                                        //Console.WriteLine(String.Format("record.ts_sec: {0}",ts_sec));
                                        UInt32 ts_usec = (UInt32)(((timeCreated.Subtract(new DateTime(1970, 1, 1))).TotalMilliseconds) - ((UInt32)((timeCreated.Subtract(new DateTime(1970, 1, 1))).TotalSeconds * 1000))) * 1000;
                                        //Console.WriteLine(String.Format("record.ts_usec: {0}",ts_usec));
                                        //Console.WriteLine(String.Format("record.incl_len: {0}",record.Properties[2].Value.GetType()));
                                        UInt32 incl_len = (UInt32)record.Properties[2].Value;
                                        //Console.WriteLine(String.Format("record.incl_len: {0}",record.Properties[2].Value));
                                        if (incl_len > maxPacketSize)
                                        {
                                            Console.WriteLine(String.Format("Packet size of {0} exceeded max packet size {1}, packet ignored",incl_len,maxPacketSize));
                                        }
                                        UInt32 orig_len = incl_len;
 
                                        writer.Write(ts_sec);
                                        writer.Write(ts_usec);
                                        writer.Write(incl_len);
                                        writer.Write(orig_len);
                                        writer.Write((byte[])record.Properties[3].Value);

                                    }
                                }
                            }
                        }
                    }
                }
            }
            return result;
        }
    }
}
'@
 
Add-Type -Type $csharp_code
 
$result = [chentiangemalc.NetworkRoutines]::ConvertEtlToPcap($Path.FullName,$Destination.FullName,$MaxPacketSizeBytes)
 
Write-Host "$result packets converted."
}

#endregion functions


<#
Trigger on an event log entry being created
@guyrleech 2019
#>
$eventLog = Get-EventLog -List | Where-Object { $_.Log -eq $eventLogName }

try {
    $latestOutageEvent = (Get-EventLog -LogName Application -Source "Citrix Broker Service" -After (Get-Date -Format yyyy-MM-dd) | Where {$_.EventID -eq 1201})[0]
} catch {
    Write-Error "Unable to find the Event Id 1201 from source Citrix Broker Service for today."
}

if( ! $eventLog )
{
    Throw "Unable to find event log `"$eventLogName`""
}

$1201TimeStamp = Get-Date $latestOutageEvent.TimeWritten -Format yyyy-MM-dd_HH-mm-ss
Unregister-Event -SourceIdentifier $sourceIdentifier -Force -ErrorAction SilentlyContinue
Register-ObjectEvent -InputObject $eventLog -EventName $eventName -SourceIdentifier $sourceIdentifier
Write-Verbose "Going into main loop @ $(Get-Date -Format G)"


if ((Get-NetEventSession).SessionStatus -eq "Running") {
    Write-Error "Exisiting trace is underway.  Aborting."
    Exit
}
if ((Get-NetEventSession).SessionStatus -eq "NotRunning") { 
    Get-NetEventSession | Remove-NetEventSession
}

try {
    New-NetEventSession -Name "Session1" -LocalFilePath "$traceOutput$($1201TimeStamp)_capturetraffic.etl" -ErrorAction Stop | Out-Null
    Add-NetEventPacketCaptureProvider -SessionName "Session1" -ErrorAction Stop | Out-Null
    Start-NetEventSession -Name "Session1" -ErrorAction Stop  | Out-Null
} catch {
    Write-Error "Starting capture failed with : $($error[0])"
}

try
{
    While( $true )
    {
        $eventRaised = Wait-Event -Timeout 2 #this allows the ping loop to continue
        if( $eventRaised -and $eventRaised.SourceIdentifier -eq $sourceIdentifier )
        {
            $event = $eventRaised.SourceEventArgs.Entry
            if( $event )
            {
                if( $event.InstanceId -eq $eventId -and $event.Source -eq $eventProvider )
                {
                    Write-Host "Monitored event raised at $($eventRaised.TimeGenerated)"
                    Write-Host "Stopping Packet Capture."
                    Stop-NetEventSession -Name "Session1"
                    Remove-NetEventSession -Name "Session1"
                    
                    #convert to cap file for wireshark
                    ConvertTo-PCAP -Path "$traceOutput$($1201TimeStamp)_capturetraffic.etl" -Destination "$TraceOutput$($1201TimeStamp).pcap"
                    break
                }
            }
        }
        elseif( $eventRaised )
        {
            Write-Verbose "Caught unexpected event $eventRaised"
            Start-Sleep -Seconds 1
        }
        $eventRaised | Remove-Event
    }
}
catch
{
    Throw $_
}
finally
{
    Write-Verbose "Unregistering event source identifier $sourceIdentifier"
    Unregister-Event -SourceIdentifier $sourceIdentifier -Force
}

 

Script 4 – Execute a CDF Trace during the outage

#requires -version 3
$ErrorActionPreference = 'Stop'

<#
    .SYNOPSIS
    Starts CDF Trace if an existing trace is not running.  Targeted specifically for database outages noticed by the Citrix Broker Service.
 
    .DESCRIPTION
    Starts CDF Trace if the Citrix Broker Service cannot contact the database.  Another event-based task will be created and linked to event
    ID 1200, Citrix Broker Service, Info to stop the CDF trace.
  
#>
#region variables
#Change these paths to reflect your environment
$CDFControlEXEPath = "$($args[0])"
$CTLFile = "$($args[1])"
$TraceOutput = "$($args[2])"

#Database restored event log values
[string]$sourceIdentifier = 'Citrix Broker Service'
[string]$eventLogName = 'Application'
[string]$eventProvider = 'Citrix Broker Service'
[int]$eventId = '1200'
[string]$eventName = 'EntryWritten'
#endregion variables


try {
    #see if an existing CDF trace is running
    if ((logman -ets | Select-String -SimpleMatch "CDFCONTROL_TRACE_SESSION" -Quiet) -ne $true) { 
        Write-Error "No CDF Trace Session found." -ErrorAction Stop
    } else {
        Write-Host "An existing CDF Trace is already running."
    }
} catch {
    Write-Host "Starting a CDF Trace."
    Start-Process -FilePath $CDFControlEXEPath -ArgumentList @("-start","-guids","$CTLFile","-path","$traceOutput")


    <#
    Trigger on an event log entry being created
    @guyrleech 2019
    #>
    $eventLog = Get-EventLog -List | Where-Object { $_.Log -eq $eventLogName }
    if( ! $eventLog )
    {
        Throw "Unable to find event log `"$eventLogName`""
    }
    Unregister-Event -SourceIdentifier $sourceIdentifier -Force -ErrorAction SilentlyContinue
    Register-ObjectEvent -InputObject $eventLog -EventName $eventName -SourceIdentifier $sourceIdentifier
    Write-Verbose "Going into main loop @ $(Get-Date -Format G)"
    try
    {
        While( $true )
        {
            $eventRaised = Wait-Event
            if( $eventRaised -and $eventRaised.SourceIdentifier -eq $sourceIdentifier )
            {
                $event = $eventRaised.SourceEventArgs.Entry
                if( $event )
                {
                    if( $event.InstanceId -eq $eventId -and $event.Source -eq $eventProvider )
                    {
                        Write-Host "Monitored event raised at $($eventRaised.TimeGenerated)"
                        Write-Host "Stopping CDF Trace."
                        Start-Process -FilePath $CDFControlEXEPath -ArgumentList @("-stop","-noprompt")
                        break
                    }
                }
            }
            elseif( $eventRaised )
            {
                Write-Verbose "Caught unexpected event $eventRaised"
                Start-Sleep -Seconds 1
            }
            $eventRaised | Remove-Event
        }
    }
    catch
    {
        Throw $_
    }
    finally
    {
        Write-Verbose "Unregistering event source identifier $sourceIdentifier"
        Unregister-Event -SourceIdentifier $sourceIdentifier -Force
    }
}

Configuring the Trigger for Automation

Now you need to configure the trigger. For Citrix database outages you can key in on a specific Citrix event. In this case, that’s Event ID 1201 from the Citrix Broker Service:

Event id equals to 1201 Log is Application Source is Citrix Broker Service Type equals to Warning

You then can configure an Event-based trigger with the following properties:

Now, you attach multiple script actions to the trigger.  

By default, ControlUp will execute automation sequentially, with each script waiting until the other has completed before starting the next. For this scenario you need to set up the automation for simultaneous script execution. Simply add a registry value to the ControlUp Monitor:

Registry Key: HKEY_USERSS-1-5-20SoftwareSmart-XControlUpMonitorSvc
Name: MaxAutoExecJobsTarget
Type: REG_DWORD
Value: (in decimal) number of allowed simultaneous scripts

Let’s see it in action:

 

In my next blog I’ll cover how to use ControlUp Automation to automatically remediate the problem. 

 

About the author

Trentent Tye

Trentent Tye, a Tech Person of Interest, is based out of Canada and its many, many feet of snow. FUN FACT: Trentent came to ControlUp because, as a former customer, the product impacted his life in so many positive ways—from reducing stress, time to remediation, increased job satisfaction, and more—he had to be our evangelist. Now an integral part of ControlUp’s Product Marketing Team, he educates our customers, pours his heart and soul into the product, and generally makes ControlUp a better place. Trentent recently moved to be closer to family. He does not recommend moving during a pandemic.