How, and Why, We Applied Machine Learning to Cove Continuity, Part 1

Over the next two blogs, I want to explain how we used machine learning to increase Cove Continuity boot-check accuracy to 99%.
Cove Continuity offers the ability to restore source (protected) servers/workstations to virtual machines (VMs) in Hyper-V, ESXi, or Azure. After a VM is restored, Cove performs a boot-check test to prove that the system was properly restored. Then, it takes a screenshot of the restored VM’s boot screen and collects part of the system event log to later display it to customers. It also detects the boot status—whether it is successful or failed. If it is successfully booted, services inside are running and available for customers.
It is essential for our customers as they also provide those reports directly to their end-customers. So, the boot-check test must be robust as this is one of the first things that our product is assessed by.
Is it colorful enough?
To address the above challenge, we used several different approaches. It all started with the very simple idea—when a VM is booted, a colorful login screen is displayed. While if there is a problem it is usually some error shown on a black or blue screen. So, we can understand if this is a login screen or not, by simply making a screenshot and checking its size. If it is above a certain size, we can consider it is a login screen. If not, it’s probably an error screen. Simple enough, right? So, that is what we did, and it worked well at first.
At the end of 2019, we introduced Recovery Testing—which allows users to test recovery and boot check test at our N‑able hosted env. The setup of that solution is very simple and straightforward—via a wizard in our centralized web management console. As a result, we started to have a huge increase in the number of restores. Today, we are doing more than 60k restores a month in our secure environment.
That’s when we realized our initial approach for boot-check testing had a flaw. Different machines have different screenshot resolutions, and sometimes there is a lot of text on the blue screen of death as well. In other cases, the login screen might look minimalistic and «greyish» with only a few numbers (current time) on it. So, in many cases simply checking the size of the screen did not help—we got both false positive and false negative results.
Injection
At that time, we introduced another approach. When restoring the VM, we injected a tool into the restored VM and configured it to auto-start. Then, when booting the VM, we attempted to establish a connection with that tool. If we are successful, then it means it is running, which in turn means that the OS is also running, so we can prove it is booted, right?
This approach also came with some issues. In Recovery Testing, we do not enable networking, as we do not want to potentially expose services running inside to the outside world. So, to establish that connection, we used a COM port for that communication. However, there were cases when we were not able to establish that connection due to permissions or other technical issues. Sometimes, even when OS had started and services were running, we were able to communicate with our tool, Windows was still displaying some kind of «Applying computer changes» screen (rather than a boot screen). Customers want a proper login screen.
Maybe Microsoft could tell us if it was booted?
To avoid the above issues, we also tried to analyze Windows event logs, looking for some specific entries that could indicate the OS was booted. The issue with that approach is that you are always restoring a VM that was backed up in the past. So, it might have an improper system timer or events might not be logged with the actual time, until it is synchronized. This complicates understanding whether the VM was booted now, or if it was an event written on the source (backed up) machine.
Hold on. What about ESXi and Azure?
Previously, we were mainly talking about two major features we had—Recovery Testing and Standby Image (SBI) using Hyper-V. In 2023, we introduced Standby Image (SBI) to Azure. Boot check in Azure uses completely different tools and methods. You must rely on Azure services to properly collect a screenshot and detect if that is booted using Azure boot diagnostics and Azure VM agent.
Recently we’ve released SBI to ESXi as well. And in the preview version of that feature, we were using the old approach with a screenshot size check, as implementing a COM port-like approach for ESXi requires additional efforts and it proved itself to be error-prone. So, we ended up with three different complex approaches for different recovery targets. Not ideal.
Thankfully, we solved this issue … we’ll look at our solution in detail in the next blog.
Sergey Shaminko is Cove Engineering Manager at N‑able
To learn more about how Cove keeps your customers’ data safe, don’t hesitate to schedule a call with us!
If you are interested in learning more about Cove’s approach to cyber resilience, please don’t hesitate to schedule a demo.
To FIND OUT MORE about Cove Data Protection visit www.n-able.com/products/cove-data-protection Or simply start a FREE TRIAL at www.n-able.com/products/cove-data-protection/trial
© N‑able Solutions ULC y N‑able Technologies Ltd. Todos los derechos reservados.
Este documento solo se proporciona con fines informativos. No debe utilizarse para obtener orientación legal. N‑able no ofrece ninguna garantía, implícita o explícita, ni asume ninguna responsabilidad legal o jurídica por la exactitud, integridad o utilidad de cualquier información contenida en este documento.
N-ABLE, N-CENTRAL y otras marcas comerciales y logotipos de N‑able son propiedad exclusiva de N‑able Solutions ULC y N‑able Technologies Ltd., y pueden ser marcas sujetas al derecho anglosajón, estar registradas o pendientes de registro en la Oficina de Patentes y Marcas de Estados Unidos o en otros países. El resto de marcas comerciales mencionadas en este documento solo se utilizan con fines de identificación y son marcas comerciales (o marcas comerciales registradas) de sus respectivas empresas.