Backup e disaster recovery

How, and Why, We Applied Machine Learning to Cove Continuity, Part 2

By Sergey Shaminko

Maggio 23rd, 2024 5 mins

Content

Preparing the dataset and training the model Evaluate results and tune parameters

If you haven’t already read part 1, click here to do that first.

If you look at the screenshots, they’re actually quite simple to understand. Anyone can easily identify whether the OS booted successfully at first glance. Look at the following examples and you’ll see what I mean :

So, rather than the existing deterministic method, which relied on indirect evidence, we opted to use machine learning and neural networks to analyze and classify screenshots like a human being.

Only a few years ago, this might have sounded like science fiction. Today, there are a variety of mature AI/ML tools and technologies ready for practical application. Better still, many of them are open-source and publicly available. So, our job was to look for proper tooling and use it accordingly.

We knew what to do, however we had little experience in the area. So, we decided to start with a POC. We evaluated a number of neural networks that can be used to classify images, and opted to use SqueezeNet—a well-known open-source model that proved its efficiency in numerous contests.

Preparing the dataset and training the model

Roughly speaking, the SqueezeNet model is a kind of an architecture or algorithm that can solve a general problem. In this case classifying images. However, with some training we can make it solve our specific problem—classify VM screenshots.

To train the model, we first must provide it with a proper dataset. This allows the model to “know” what we are trying to classify and what we expect as a result. Since Recovery Testing has been in production for several years, we had tons of VM screenshots to use for training. To prepare the dataset, we removed all PII and manually labeled 2,500 screenshots with a specific class, e.g., failed to boot, successfully booted, or loading.

We used PyTorch to conduct the training. It took us around 60 minutes on the above dataset. And as a result of training, we got a 5MB file with parameters. Pretty dense and will not eat a lot of RAM. Nice!

As an output, we received model weights, or parameters. If we apply these parameters to the model, it should be able to solve the problem on any set of screenshots, even if they were not part of the initial training set.

Evaluate results and tune parameters

Theoretically, we could have applied that model in production, but before doing that we had to validate the results and, if needed, tune the model parameters. In the first stage, we checked the results of the model training on 1,000 screenshots that were not part of the training set. In the second stage, we checked it on another 36k screens.

The results were tremendous. It takes 1 second (max) on a decent developer’s workstation with Intel i7 to classify a screenshot. More importantly, we now have a mechanism to classify screenshots with an accuracy of 99%. Not bad!

A summary in numbers:

2.5k screenshots used for training
37k screenshots used for verification
5MB size of the file containing model parameters
1 second max time needed to classify a single screenshot
99% accuracy of the classification

Sergey Shaminko is Cove Engineering Manager at N‑able

To learn more about how Cove keeps your customers’ data safe, don’t hesitate to schedule a call with us!

If you are interested in learning more about Cove’s approach to cyber resilience, please don’t hesitate to schedule a demo.

To FIND OUT MORE about Cove Data Protection visit www.n-able.com/products/cove-data-protection Or simply start a FREE TRIAL at www.n-able.com/products/cove-data-protection/trial

Il presente documento viene fornito per puro scopo informativo e i suoi contenuti non vanno considerati come una consulenza legale. N‑able non rilascia alcuna garanzia, esplicita o implicita, né si assume alcuna responsabilità legale per quanto riguarda l’accuratezza, la completezza o l’utilità delle informazioni qui contenute.

N-ABLE, N-CENTRAL e gli altri marchi e loghi di N‑able sono di esclusiva proprietà di N‑able Solutions ULC e N‑able Technologies Ltd. e potrebbero essere marchi di common law, marchi registrati o in attesa di registrazione presso l’Ufficio marchi e brevetti degli Stati Uniti e di altri paesi. Tutti gli altri marchi menzionati qui sono utilizzati esclusivamente a scopi identificativi e sono marchi (o potrebbero essere marchi registrati) delle rispettive aziende.

Cove offre ora supporto per il disaster recovery nel cloud in Azure

Report sulla situazione dei SOC per il 2025

N‑able è campione per Canalys per il secondo anno consecutivo

How, and Why, We Applied Machine Learning to Cove Continuity, Part 2

Preparing the dataset and training the model

Evaluate results and tune parameters

Cybersecurity Maturity Assessment: 7 Step Framework

Ransomware Recovery Playbook: A 10-Step Guide

How to Build a Cyber Resilience Strategy in 10 Steps