I’ve been using the word “effective” a lot lately in training sessions to differentiate between a specific task being done well and the same task done exceptionally well. There’s many ways how I can highlight this type of approach in SQL Server but the most important area that I use to demonstrate this is how we backup our databases. Backups are of course the most critical piece of database maintenance undertaken by DBA’s, without them you are essentially one failure away from total loss of data and without question that risk is very real – it can (and does) happen.
This makes backups one of the first tasks that administrators tend to when setting up a new instance of SQL Server. Backup jobs are created to cover all databases, they’re scheduled in SQL Agent to comply with the organisations recovery requirements and notifications are added to alert the DBA of any job failures.
If disaster does occur then you need to be able to take a backup set and successfully restore it so the question around backups isn’t just, are my databases being backed up, but are they also in a fully restorable state? The fact is that even if backup jobs are running successfully this doesn’t necessarily mean that the backups can be restored in the event of emergency, and that is the last place where you want to be finding that out.
One of the options available within backup maintenance is to perform backup verification where SQL Server ensures that the backup file is in a readable state by performing some checks after the backup has completed (by using a RESTORE VERIFYONLY command). This might sound like just the thing we’re looking for but on its own it backup verification just isn’t enough. Here’s the official wording:
“Verifies the backup but does not restore it, and checks to see that the backup set is complete and the entire backup is readable. However, RESTORE VERIFYONLY does not attempt to verify the structure of the data contained in the backup volumes.”
The restore verify process does check a few important things; if the backup set is complete and readable, checks header fields of pages and also checks if there is sufficient space on the specified backup device but that doesn’t really indicate if the backup is restorable.
In order to improve backup verification we also have to ensure that our backups are taken with the checksum option. In addition to the checks noted above the verification process will also validate both page and backup checksums, essentially checking that the backup file hasn’t been corrupted by the IO subsystem in some way.
It’s easy to see how useful these options are but they are not set by default within native backup maintenance plans inside SQL Server (if you’re using Ola’s script then checksum is default). This means that the options can easily be missed and in some cases, because these checks use additional resource and make backup routines take longer, I have even known them to be deliberately disabled.
I’d rather use up some additional time and resource to know that my backups are in a reasonable condition so if you are using SQL Server maintenance plans then enabling backup checksum and verification is an absolute must or move to using Ola’s scripted solution. It’s also worth noting that from SQL Server 2014 that there is a configuration option in SQL Server to enable backup checksum at the instance level, meaning all backups.
It’s not a default option and has to be set manually and does not require a restart:
Until then, here’s how to enable the option in T-SQL:
sp_configure ‘backup checksum default’,1;
reconfigure with override;
Even at this point your backups can still be vulnerable. Paul Randal has wrote a lot about this, even how some have questioned if consistency checks can be substituted for backup/restore checksums and just in case you’re wondering this isn’t the case, there are still some ways a backup can become corrupt but report a valid checksum.
This means we’re left with only one way to prove if a database backup is restorable and that’s to actually restore it, and perform a consistency check on the newly restored database. This might sound long winded, particularly if you have a large SQL estate so this is where automation really becomes an essential part of the process. Thankfully there’s many ways we can build this type of backup/restore/check mechanism with the likes of T-SQL, SQLCMD or PowerShell, but even better than that, the dbatools PowerShell module has done most of the hard work for you.
Automation brings other benefits aside from not having to spend days and days sitting there manually restoring and checking every single production database. We can also easily implement logging giving us a full history of our database restores and consistency checks, reporting on our maintenance is another critical part. We can also see how long our restores are taking over time. As databases grow the time to complete backup and restore operations is going to take longer but this capacity planning is often not taken into account and we can easily outgrow our original recovery objectives.
As we can see there’s more to effective backups than just taking them. We can improve backup maintenance plans by enabling checksum and verification options but in order to prove a backup is restorable for DR purposes we must actually restore from backup and perform a database consistency check. Using automation greatly simplifies the process and we can implement reporting on our maintenance and perform additional checks to ensure our recovery objectives can still be achieved.
Effective backup and restore strategies are just some of the items we cover in our Starting SQL Server Administration course, perfect for those starting with implementing database maintenance and if you’d like us to health check your current maintenance processes then we offer a comprehensive review as part of our Managed Database Administration service.