Don’t be led astray by the common misconception that a successful backup automatically equates to a successful restore. The hard truth is most MSPs don’t perform periodic testing of their clients recurring backups prior to being called upon to perform data recovery. This is when you find yourself in the uncomfortable situation of going into a recovery scenario not knowing if you will emerge successful. However, recovery scenarios don’t have to be that stressful for you or your customers—performing basic recovery testing of their business-critical servers is the key.
Remember, a backup is just a point-in-time copy of the data you selected on a running system. It doesn’t ensure the correct data was selected for the recovered system to be bootable. The minimum requirement for a bootable machine is a successful system state backup and a file system backup of your C:\ volume or boot volume. You must also select any other data volumes and applications that you want to protect. Setting backup filters and exclusions for temp files, media, download directories, dump files, etc. are fine, just don’t exclude items required for your applications to run or that are needed for a successful boot.
A pending OS patch, driver, or application update that’s waiting for a system reboot can further complicate a virtual or bare metal recovery. Because these updates are pending reboot and have not yet been applied at the time of the backup, you don’t know their impact until the first boot after a recovery. If the driver or patch causes a problem after the first boot, you’re potentially stuck unless you can identify and remove the offending update, or you roll back to a known good backup point that’s prior to the update. To mitigate this risk, apply patches and updates frequently and schedule system reboots to ensure you don’t go weeks with an update pending. Additionally, make sure to confirm your default backup retention is at least twice as long as the typical period between patches and scheduled reboots. Finally, setting up monthly or quarterly archive schedules will provide you with long term rollback points should you ever need them.
Viruses and ransomware
Just as a bad update can leave you stranded with an unbootable system, so can ransomware and viruses that manage to infiltrate the system before the next backup runs. If you ever find yourself in a scenario where the restore contains damaged and encrypted files or trojans waiting to trigger, don’t panic. This is simply a case of the backup software doing what it’s designed to do, which is protecting the data that it sees on the selected volumes. Rest assured that even if we do perform a backup of an infected file, it can’t activate or propagate to other files or devices while it’s encrypted on your LocalSpeedVault or in the SolarWinds® Backup cloud.
To proceed with recovery, first check your backup history and your system event logs to determine when the infection likely occurred. Next, clean, wipe, or otherwise neutralize the infection and damaged files from the system. Then select an earlier backup session to use for recovery. If you’re uncertain about modifying, overwriting, or deleting data on ransomed systems, you can always use alternate hardware to perform a bare metal recovery (BMR) or virtualize the restore using virtual disaster recovery. While it’s never advisable to pay the ransom, either of these options will leave you free to do so later if the data proves to be unrecoverable for some reason.
The automated recovery testing function inside of SolarWinds Backup is a chargeable feature that you can use to perform monthly or bi-weekly recovery and boot testing of your important Windows servers and workstations. Think of this as the equivalent of opening your refrigerator door and seeing the light come on. This basic, visual check is normally enough for you to assume your “milk is staying cold” your server will boot when needed. However, it’s still advisable to perform the occasional hands on “date or sniff test” to make sure, as your milk could very well be spoiled.
The recovery console offers much the same set of recovery testing options, at no additional charge, when using your own local hypervisor as the restore target. It also has the added ability of retaining the restored instance and applying only delta daily changes to it. This standby image of your production system could then easily become a failover target with the availability of just a few minutes (i.e., the time it takes to boot a VM).
You backup management console offers a dedicated Recovery testing overview section to configure and monitor all of your paid recovery testing devices, as well as a predefined dashboard view names “VDR status” that is capable of seeing both recovery testing and recovery console configured devices, restore status, and boot screenshots.
Eric Harless is the head backup nerd at SolarWinds MSP. Eric has worked with SolarWinds Backup since 2013 and has 25+ years of data protection industry experience in sales, support, marketing, systems engineering, and product management. You can follow Eric on Twitter at @backup_nerd