In 2020, uncontrolled endpoints and environments—also known as PCs, home wireless networks, and home internet—made IT administrators’ jobs exponentially more difficult. Before the pandemic, IT admins—with the help of ControlUp—could look at their infrastructure and see metrics like “User Input Delay,” CPU and memory usage and other performance metrics related to datacenter resources. If these metrics looked good, then admins had to resort to the process of elimination and placed the blame on the user’s device or environment.
But you couldn’t be sure. It was a guess-and-test process.
“What about a different device?”
“Try a different wireless network?”
“Tell your ISP to upgrade your internet service?”
Each of these things costs time and productivity when you’re forced to troubleshoot them blind. And honestly, it’s one of the most stressful and frustrating processes… for both the administrator AND the user.
Here’s how Remote DX can help:
Oh, Wi-Fi, Wi-Fi, Wi-Fi. It’s always Wi-Fi!
At least most of the time, right?
ControlUp’s new Wi-Fi metrics—Wi-Fi radio type, Wi-Fi signal strength, and Client NIC speed—combine to give you a unique view into the Wi-Fi performance of the client devices in your environment.
From playing with this in my home, I’ve found that Wi-Fi often reduces the speed of the connection in an attempt to maintain higher signal strength.
This does mean setting a stress threshold on Client NIC Speed, which would normally only be advised in situations where you have tight control over the access points and client devices. Since different users working from home are probably using their ISP-provided access point, their performance levels can be vastly different. An 802.11g access point, for example, tops out at 56Mbps. My 802.11ac network topped out at 1,300Mbps.
Keying in on the Wi-Fi signal metric provides a good proxy for how well a device’s connection to the local Wi-Fi network is. In my tests, values below ~70% meant I was too far away from the access point and since I knew my maximum connection speed, I could almost pinpoint the distance I was from my access point by correlating it with the Client NIC speed.
Monitoring and keeping an eye on the Wi-Fi signal metric is a great way to view the health of your remote networks. The lower the signal strength, the more likely a user would get disconnected or have dropped packets, which means it’s likely your users’ digital experience isn’t great.
For the initial release of RemoteDX, the ping timeout for LAN Latency is set to 1000ms and any values exceeding that will show N/A in the column.
Assuming the access point has low CPU usage or little network contention, the amount of time taken here is often analogous to how many devices an access point has connected to it (anyone who’s been working from home for the past year while their kids are doing virtual school or playing with their Xbox knows this). If you have an older access point, the wireless router will talk to each device sequentially, with each device taking some amount of time to process both the send and receive signals before the access point talks to the next device.
For each phone, computer, tablet, TV, or other device with Wi-Fi connected to your access point, you will incur a small performance penalty.
Newer Wi-Fi 6 (aka 802.11ax) devices work around this limitation intelligently, but any older devices connected to it will suffer the same performance penalties.
High LAN latency can also be a symptom of an overloaded client device. If CPU and memory are being completely consumed on the client device, you may see high LAN latency along with the other metrics.
Another reason for poor LAN latency metrics is an overloaded access point. With access points being more feature-rich, taking on capabilities like becoming NAS devices, running programs on them for streaming media and the like, an overloaded access point that is spent doing other things than handling your traffic can cause higher LAN latency.
Finally, your users might experience higher LAN latency if they live somewhere with a large number of devices and access points all operating at the same frequency—like in high-density living complexes. Just plain and simple interference can reduce performance, causing an increase in LAN Latency.
True stories of shoddy Wi-Fi performance, by Me:
Over the last few years, I have been incredibly frustrated with the Wi-Fi performance of my 2013 MacBook Pro. Network performance seemed to operate in “waves” of performing just fine, followed by delays and poor performance, then good again, on and on, ad nauseam. I could see these waves by doing a simple ping test. Imagine my surprise when the performance continued to suffer with a new 2018 MacBook! At least I knew it wasn’t the hardware.
The “waves” looked like this:
The cause of all this was the Wi-Fi menulet, which, in versions of the MacOS prior to 11, would periodically scan the network for available access points. Each time the Wi-Fi menulet did its scan, it would “detach” from the current network and then reattach after the scan.
What’s interesting about ControlUp being a real-time monitoring tool, is that we can see these waves in Remote DX!
The sudden spikes in LAN or internet latency as caught by Remote DX occur at the same time as the menulet scanning for available Wi-Fi access points. Very cool. 😎
This is not an all-encompassing list, but I believe these are among the more common home latency problems.
Internet Latency measures the time to ping from the client device to Google DNS servers 220.127.116.11. The client device should resolve to the server closest to their location, giving a fairly good idea for the performance of the internet service provider (ISP).
A higher value here could indicate poor routing is in play or potentially some throttling (e.g. for 5G or LTE routers).
For the initial release of RemoteDX, the ping timeout is set to 1000ms and any values exceeding that will show N/A in the column.
The amount of time the average of three packets takes in a round-trip from the server to the client device is called “Total Session Latency.”
Values here are directly attributable to the experience of your users. With high latency comes higher action response times. Sometimes, this is unavoidable, since the amount of time packets take to come to the United States from India just takes time. The speed of light is a limit, and travel time is a real, unavoidable thing.
However, if you have groups of users that work at the same location, but they have wildly different latency, you could be experiencing a poor routing issue. I worked with a customer who had this exact scenario and they were able to identify that the ISP for the one of two remote offices had a poor route that added a significant and noticeable decrease in latency when compared to the neighboring office.
Total Session Latency is similar to the values from the Citrix Network Latency performance counter, or the VMware Horizon Performance Tracker, as they are essentially calculating the same thing. Total Session Latency, out of the box, monitors at a more frequent cadence than the Citrix Network Latency metric, which defaults to retrieving a value every 20 seconds. This means that values from the total session latency will more closely align with what the user is experiencing at the moment you see the metric in ControlUp.
To determine whether or not you are providing a good digital experience, you should follow the guidance of your EUC vendor. As an example, Citrix provides the following:
“Good” values are “150 ms” and lower.
“Acceptable” values are 150-300ms
“Poor” values are >300 ms
RemoteDX metadata brings you information related to the session connection (things like the local router IP, client public IP, Wi-Fi SSID and BSSID, etc). All of this information can help you make informed decisions about the data you see.
For example, when I worked in healthcare, we had a particularly troublesome problem that generated tickets that came to our Citrix team.
At our hospitals we had “Workstations on Wheels” that we called WOWs (or COWs, for “computers on wheels”). A WOW is essentially a full-blown computer with Wi-Fi that rolls around on a cart. These carts had a giant UPS attached to them, so they could be moved freely throughout the hospital without losing your work.
That last part is important.
At one of our hospitals, we started getting tickets about sessions suddenly disconnecting or freezing. Several minutes later, the session would reconnect or unfreeze.
Using ControlUp, we could see the frozen session was either in a “disconnected” or “active” state, but with the Idle Timer increasing each minute. This meant the user was not connected to their Citrix session. We were able to validate the problem.
But without visibility into the network stack, this was all we could do. We could see the session was disconnected (or idle, implying a disconnection) and we were stuck. We brought the network team in to examine what we thought was a network issue, but they maintained that the wireless connections from the devices were stable and they did not see any drop in their wireless connections.
The IT teams were at an impasse: we on the Citrix team, saying the sessions getting disconnected were coming from one location, and the network team stating that the Wi-Fi was fine and the issue was with Citrix. Of course, users are the ones who suffer in these situations.
Replicating the issue was difficult too—even though it happened several times a day. The only commonality was location.
Eventually, some months and many closed tickets later, one of the wireless access points in this hospital died and was replaced. And just like that, the tickets stopped. No more disconnections were occuring at that site.
In this real-life scenario, I lacked the information to help the network team accurately diagnose this issue. Their own metrics reported everything was fine; the device said it was healthy. It was a ghost in the machine and neither team had the ability to ‘share notes’ with information outside their IT silo. So, the problem sat and persisted.
But with Remote DX metadata, there is now a wealth of information that could have led to correlation and possibly identifying the issue faster.
If we had been able to record the Wi-Fi SSID and Wi-Fi BSSID, we could have identified the EXACT access point where we were seeing the disconnects occur. In my scenario, our organization had a mesh network at the hospital, so the Wi-Fi SSID would have shown that the users were not on the Guest network (if the user didn’t trust the corporate network, they would connect to the Guest network, then login to Citrix externally) and the Wi-Fi BSSID would have helped us identify the access point to which the user was connected when the disconnection occurred. In addition, since almost everyone in the organization used Citrix, with Remote DX, we could group by SSID, sub-group by BSSID, and then simply count how many Citrix users were connected to that specific access point.
With enough samples we could have resolved this issue more quickly and with a higher confidence in the fix.
ControlUp Remote DX brings in metadata that we can use in a security context. By providing information like whether a user is connected to a WPA2 network or an unsecured network, OS version, or just looking at the Wi-Fi SSID and seeing if they are not connected to the corporate network — ControlUp can take actions on these user sessions.
From automatically sending a message to a user asking them to upgrade their operating system, or automatically disconnecting or logging off users and sending them an email letting them know that ControlUp detected they were connected to an unsecured network — ControlUp can handle these scenarios with ease!
Metrics in ControlUp can have stress levels and triggers attached to them, either for alerting or to produce automations.
Some examples of automations we are releasing with ControlUp Remote DX:
These automations can be enabled by going into the Triggers feature of ControlUp and ticking the checkbox.
Here is what users will see for each operation:
Notify user when poor Wi-Fi is detected:
Notify user when unsecure operating system is detected:
Notify user when unsecured wifi access point detected:
With ControlUp, you can increase your security by logging users off, shortly after they’ve connected, with these attributes. For example, let’s say you don’t want any users who have connected to an unsecure access point; you can simply add a logoff script action to the trigger with a simple delay.
This is the most exciting release of ControlUp yet. RemoteDX brings exceptional visibility into your work-from-home users network performance and surrounding wireless metadata. With this data you can gain the confidence in troubleshooting “remote” network issues that you have found the root cause or that the fix applied actually worked. You can now track operating systems versions to ensure they are supported by the vendor and able to get patches. You can monitor how long the client device is inactive, as opposed to relying on Citrix or VMware session timers.