Posts

Time is money!

     Your boss keeps talking about RPO (Recovery Point Objective)  and RTO (Recovery Time Objective).  Do you just nod your head like you know what he/she is talking about?  Maybe that scenario just happened and you are searching the internet for what these terms mean.  If so, welcome!  No one likes to think about disasters, but they happen all too often.  Planning for the worst and hoping for the best will keep your data safe and your job even safer. Let’s take some time and explore what RPO and RTO mean, why these things are important, and what you need to do next to be a hero DBA!

RTO (Recovery Time Objective)

     Recovery Time Objective is the amount of time in which your company expects you to have the database fully restored after a disaster.  That is, how much downtime is acceptable for disaster recovery or planned outages.  Each company is different, and most reference RTO in terms of nines. 

     The nines measure for a company that measures 365 days a year, 24 hours a day as follows:

5 9’s – 99.999% (this translates to about 5 minutes of acceptable downtime per year)
4 9’s – 99.99% (this translates at about 52.5 minutes per year and is much easier to achieve)
3 9’s – 99.9%  (this translates at about 8.75 hours per year)
2 9’s – 99% (translates to about 3.5 days a year)

     To decide what RTO is best for your company, you need to take into consideration your data needs.  Not all companies run on a 365/24 schedule.  Some companies only measure downtime between 8am-6pm Monday through Friday, or only on the weekends.   This will drastically change the translation of the 9’s.  Another thing to think about is whether the measured downtown includes time for maintenance or patching, times when the database must be offline.  If maintenance time is eliminated from consideration, meeting the higher 9’s is much easier.

     If your company insists on an RTO of 5 Nines and does not take into consideration maintenance or patching, then you must speak with the persons in charge to discuss the RPO.  It is possible to adhere to the strict 5 minutes of downtime, but the point at which you are able to recover, will definitely be restricted.

RPO (Recovery Point Objective)

     Recovery Point Objective is the level of data or work that is acceptable to lose in the event of a disaster.   Ideally, companies will want ZERO data or work loss.  While that IS achievable, it will all depend on  valid backups and the extent of damage the database suffered at the point of disaster.  

     An RPO of 15 minutes means that the data and work must be recoverable to a point within 15 minutes of the disaster, meaning that it is expected that only 15 minutes of work or data may be lost.  Stop right here and think about your backup  plans and recovery models.  Restoring a database that is in simple recovery model should not take as long as a restoring one in full recovery model. It is important to remember (from previous blog posts), the recovery model dictates how much data you can recover. It is also important to remember, the ability to recover ANY data at all is fully dependent on having valid backups.

Run Book

     Another term you might hear is “Run Book”.  A Run Book is a physical or digital collection of information that is needed to restart the database in case of disaster.  There are many items that should be included in the runbook.  Some of the essential items one should consider having in the runbook are:

  • Server level info, configuration, purpose, etc.
  • List of all databases and applications using them
  • List of agent jobs and proper response to a failure
  • Disaster Recovery process with all contacts, RPO/RTO, etc. required to bring it back (based on level of issue)
  • Security
  • Backup schedules

     When considering a run book, think about what someone would need if they were new to the company and the only person available to restart the database.  What information would that person need?  Making sure your run book is up to date on a regular basis is certainly a great idea!

Preparing for disaster

     Keep in mind that if you prepare for the worst, you will be less likely to be caught off-guard with a manager breathing down your neck asking “WHEN WILL WE BE BACK UP AND RUNNING?!?!”  Do you have any idea how long it will take to restore your database?  If your answer is “no,” I would suggest doing a restore to see how long this takes.  Further, I would suggest making it a habit to perform drills so that you and your team know what to do in the event of a disaster, and exactly how long it takes to get your company back up and running. Having a solid backup schedule, validating those backups, and keeping your company’s expectations in mind, you will be ready to handle any data disaster that may be thrown your way.  

In my last post (A Beginner’s Guide to SQL Server Backups) we discussed the basics of SQL Server Backups.  As Backups are the foundation for Recovery, the next logical discussion should be Recovery Models.  For reference, I have included this Sample Backup Plan from the earlier post:

Sample Backup Plan

Sample Backup Plan

There are Three SQL Server Recovery Models

Simple Recovery Model

This is the most basic of the Recovery models.  It gives you the ability to quickly recover your entire database in the event of a failure; (or if you have the need to restore your database to another server) however, this only recovers data to the end of the last backup.

Thankfully, Differential Backups can be utilized with this recovery model. NOTE: Any changes made after the last Full or Differential Backup will be lost.  Transaction Log Backups cannot be used in Simple Recovery Model.  There is no Point-In-Time recovery in this model (recovering to a specific transaction or point in time).  Looking at the sample backup plan above… if you have an issue occur Saturday afternoon, you will lose any changes in data since your last Differential Backup that ran Friday night.  If you do not opt for Differential Backups and only perform Full Backups once a week, your changes and data will be lost for the full week.  (GASP!)

Why would you ever consider using such a basic, Simple Recovery Model?  Actually, there are a few really good (and perfectly safe) reasons why you choose to use Simple Recovery Model:

  • Your data is not critical and can easily be recreated
  • The database is only used for test or development
  • Data is static and does not change
  • Losing any or all transactions since the last backup is not a problem

Full Recovery Model

This is the most inclusive of the Recovery models; you can recover (depending on your valid backups) data up to the last transaction that was run before the failure. Log backups are a must for this recovery model, including Tail Log Backups (which will be discussed at length in a later post).  Data can be restored to a specific point in time (once again, depending on your backup plan)!  

        • Let’s explore that for a moment using the sample backup plan above.  Say you take Full Backups on Sunday night, Differential Backups every week night, and Transaction Log Backups every half hour.  If you have a failure and must restore data, by using all the backups to the specific time you need to restore, you can recover data at any moment in time!  Using this recovery model, you have a potential of minimal to no data loss (once again…depending on your backup plan and whether your backups are restorable).  You want to be the DBA who is able to restore quickly and recover data to the very last moment before the failure or crash occurred.  YES, we all want to be THAT person!

Bulk Logged Recovery Model

Once again, this Recovery model requires log backups in order to prevent the log file from continually growing.  This is a special purpose recovery model that is not widely used, and Microsoft states that it should only be used “intermittently to improve the performance of certain large-scale bulk operations, such as bulk imports of large amounts of data.”   It allows high performance bulk-copy operations and is only available when in Full Recovery.  Keep in mind that with the Bulk Logged model you can recover to the end of any backup; however, if a bulk transaction has occurred during the last log backup, those transactions will have to be redone.  From the bulk operation on, you can no longer utilize point in time restore. 

OOOOOOOOOO, I dropped the ENTIRE database!

While studying Backups and Recovery, I have been working in SQL Server 2016 AdventureWorks Database (Download Here).  To see if I REALLY understood the process of Backup and Recovery, I made a Copy Only Backup of my original database, made some changes to my database, took Differential Backups, made another Full Backup, made more changes, and made more backups….and then dropped EVERYTHING.  POOF….gone.  While I did do this intentionally, my heart was racing.  Sparkly spots darted in and out of my vision, and my heart pounded in my ears.  Would I be able to get it back?  Would I be able to restore it?  WHAT HAVE I DONE??????

Photo Credit @Lance_LT

I Dropped the DB! Photo Credit @Lance_LT

How do I know that my backups were good?

I firmly believe you learn more from one of your failures than you do from one hundred of your successes.  TEST TEST TEST TEST those backups!  I am afforded some extra wiggle room while I am working on my virtual machine and not actually working on a client’s production database.

So, I did it, I dropped that DB like a hot rock.  I attempted to restore my database in Simple Recovery Model.   My Differentials would not restore.  Somewhere along the way in the backup or restore process, I messed up.   (I will give you a hint at my mistake, I did not restore the Full Backup with NORECOVERY before I tried to restore the Differential Backup with RECOVERY – we will go into Recovery processes in a later post).  

Knowing that I had made a Copy Only Backup before I started monkeying around with the database, I set out to restore faith in myself.  And boom, I restored the Copy Only Backup!  The day was saved!  Well, mostly.  I lost some data changes, but still had the original database, so that was good-ish.

Lessons Learned

At a later date I repeated the steps above and was able to fully recover a database that was previously in Simple Recovery Model complete to the last Differential Backup (noting my mistake and learning from it).  I additionally created another database to test the Full Recovery Model with Full Database Backup, Differential, Transaction Log Backups, also utilizing Tail of the Log backup to get to the very last transaction that was run after the last backup and before my data files AND database were deleted. (This is pretty in-depth and is covered in Recovery Using Tail Log Backup.)

Please come back next time when we will explore ways to validate those backups.

Thank you for reading!