I’m a big fan of instant clones and use them on a regular basis, but they do drive me crazy on occasion.
As a prime example, just the other day, I had one node of a two-host cluster—which was running a small instant clone environment—go down. When the host came back up, ControlUp Console showed that many instant clone desktops were disconnected.
Moreover, Horizon Administrator showed error messages for many instant clone desktops.
vSphere Client also showed the desktop, as well as some of the cp-parent and cp-replicate objects, in an orphaned state.
I was not surprised to see, upon closer inspection, that the orphaned objects were associated with the host that went down.
This made sense to me; since the cp-replica and cp-parent objects are associated with a host, and because I used local datastores for this instant clone desktop pool, the instant clone objects that were hosted on it became orphaned as a result when the host went down.
Horizon Administrator has four options to recover instant clones: Restart Desktop, Reset Virtual Machine, Recover, and Remove.
Restart Desktop will perform a graceful operating system restart of a desktop, while Reset Virtual Machine will perform a hard power-off and power-on of the virtual machine (VM). Recover will recreate a desktop from its current base image, while Remove will delete a linked-clone VM from the datastore. VMware states that you should NOT delete a VM in vCenter Server before deleting a desktop with View Administrator, as this could put Horizon components into an inconsistent state.
I tried using ControlUp Console to complete the Recover and Restart actions on a desktop, but these attempts failed.
I tried various combinations of the Restart Desktop, Reset Virtual Machine, Recover, and Remove operations to get my desktops back, but these attempts all failed, too. I even scheduled a push image operation on the desktop pool to try and recover it, but this failed as well.
Things were not going well.
After searching VMware documentation and third-party websites for a solution (and coming up empty), I thought that the source of the problem might be the cp-replica and cp-parent objects (they were orphaned and not associated with the parent objects anymore), but the child objects still tried to be created from them. Unfortunately, these are protected objects that can’t be powered on or off, and can’t be deleted using vCenter Client, as these options are grayed out.
VMware, realizing that occasionally things do go wrong with instant clones, has created a suite of command-line tools to deal with these issues, including:
- IcUnprotect.cmd – unprotects folders and VMs so they can be deleted.
- IcMaint.cmd – deletes the master images, which are of the parent VM in vCenter Server, from an ESXi host so that the host can be put into maintenance mode.
- IcCleanup.cmd – unprotects and deletes some or all the internal VMs created by instant clones. This last command is only available in Horizon versions 7.10 and newer.
These tools are installed by default when you install Horizon Connection Server. Since I wanted to delete the cp-replica and cp-template objects, I used IcCleanup.cmd.
To do this, I logged in to the system that was running my Connection Server. I launched a command window as Administrator and changed to the directory for IcCleanup.cmd, which was C:\Program Files\VMware\VMware View\Server\tools\bin.
The syntax for the command is iccleanup.cmd -vc vCenterName -uid userId [-skipCertVeri] [-clientId clientUuid]. I entered iccleanup.cmd -vc 10.0.0.22 -uid firstname.lastname@example.org -skipCertVeri (the IP address of my vCenter Server is 10.0.0.22).
I then entered list, which not only showed the protected objects, but also their relationship with each other.
Next, I entered back, then delete -H 10.0.0.150 to remove all the protected objects associated with the ESXi host that went down (it had an IP address of 10.0.0.150). This delete command also lets you remove a specific type of object (i.e., template, replica, or parent), or those on a specific datastore.
I could have also deleted all the objects associated with an instant clone desktop pool by entering delete –index 1 but didn’t do so since I only needed to remove the objects associated with the host that went down.
My vSphere Client showed that all the orphaned objects were successfully removed.
A full list of the commands to manipulate protected instant clone objects can be found in the VMware Horizon 7 Product Documentation within the Instant-Clone Maintenance Utilities section. I found the write-up on using these commands a little bit terse, and there is some useful information it does not include, like how you will get an unrecognized command message if you reenter list before entering back.
At this point, I went to my Horizon Administrator and removed all the desktops that didn’t have a status of Available.
The cp-replica, cp-template, and cp-parent objects were recreated automatically. However, it took about ten minutes before the instant clone desktops were available in Horizon Administrator and I could log in to them from a Horizon client.
Instant clones have been around for many years now, and VMware has indicated that they are the future of its Horizon desktop strategy. I haven’t had any major issues with them, but cleaning up an environment after a host failure may require a few manual steps on the user’s behalf to complete, as I’ve outlined in this article.