Sauvegarde et reprise après sinistre

How, and Why, We Applied Machine Learning to Cove Continuity, Part 2

By Sergey Shaminko

mai 23rd, 2024 5 mins

Content

Preparing the dataset and training the model Evaluate results and tune parameters

If you haven’t already read part 1, click here to do that first.

If you look at the screenshots, they’re actually quite simple to understand. Anyone can easily identify whether the OS booted successfully at first glance. Look at the following examples and you’ll see what I mean :

So, rather than the existing deterministic method, which relied on indirect evidence, we opted to use machine learning and neural networks to analyze and classify screenshots like a human being.

Only a few years ago, this might have sounded like science fiction. Today, there are a variety of mature AI/ML tools and technologies ready for practical application. Better still, many of them are open-source and publicly available. So, our job was to look for proper tooling and use it accordingly.

We knew what to do, however we had little experience in the area. So, we decided to start with a POC. We evaluated a number of neural networks that can be used to classify images, and opted to use SqueezeNet—a well-known open-source model that proved its efficiency in numerous contests.

Preparing the dataset and training the model

Roughly speaking, the SqueezeNet model is a kind of an architecture or algorithm that can solve a general problem. In this case classifying images. However, with some training we can make it solve our specific problem—classify VM screenshots.

To train the model, we first must provide it with a proper dataset. This allows the model to “know” what we are trying to classify and what we expect as a result. Since Recovery Testing has been in production for several years, we had tons of VM screenshots to use for training. To prepare the dataset, we removed all PII and manually labeled 2,500 screenshots with a specific class, e.g., failed to boot, successfully booted, or loading.

We used PyTorch to conduct the training. It took us around 60 minutes on the above dataset. And as a result of training, we got a 5MB file with parameters. Pretty dense and will not eat a lot of RAM. Nice!

As an output, we received model weights, or parameters. If we apply these parameters to the model, it should be able to solve the problem on any set of screenshots, even if they were not part of the initial training set.

Cove

Bénéficiez d’une solution de sauvegarde et de reprise après sinistre centrée sur le Cloud pour les serveurs, les postes de travail et les données Microsoft 365.

Evaluate results and tune parameters

Theoretically, we could have applied that model in production, but before doing that we had to validate the results and, if needed, tune the model parameters. In the first stage, we checked the results of the model training on 1,000 screenshots that were not part of the training set. In the second stage, we checked it on another 36k screens.

The results were tremendous. It takes 1 second (max) on a decent developer’s workstation with Intel i7 to classify a screenshot. More importantly, we now have a mechanism to classify screenshots with an accuracy of 99%. Not bad!

A summary in numbers:

2.5k screenshots used for training
37k screenshots used for verification
5MB size of the file containing model parameters
1 second max time needed to classify a single screenshot
99% accuracy of the classification

Sergey Shaminko is Cove Engineering Manager at N‑able

To learn more about how Cove keeps your customers’ data safe, don’t hesitate to schedule a call with us!

If you are interested in learning more about Cove’s approach to cyber resilience, please don’t hesitate to schedule a demo.

To FIND OUT MORE about Cove Data Protection visit www.n-able.com/products/cove-data-protection Or simply start a FREE TRIAL at www.n-able.com/products/cove-data-protection/trial

This document is provided for informational purposes only and should not be relied upon as legal advice. N‑able makes no warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information contained herein.

The N-ABLE, N-CENTRAL, and other N‑able trademarks and logos are the exclusive property of N‑able Solutions ULC and N‑able Technologies Ltd. and may be common law marks, are registered, or are pending registration with the U.S. Patent and Trademark Office and with other countries. All other trademarks mentioned herein are used for identification purposes only and are trademarks (and may be registered trademarks) of their respective companies.

Cove prend désormais en charge la reprise après sinistre Cloud dans Azure

État des lieux des SOC en 2025

N‑able a été désigné Canalys Champion pour la deuxième année consécutive

How, and Why, We Applied Machine Learning to Cove Continuity, Part 2

Preparing the dataset and training the model

Cove

Evaluate results and tune parameters

PODCAST | Peace of Mind, Not Just Backup: Crafting Continuity for Microsoft 365

Token-Based Authentication: How It Works

Mailflow & Anti-Spam Refresher Series – Part 2: Sender Authentication & Spam Filtering