Time is money!

     Your boss keeps talking about RPO (Recovery Point Objective)  and RTO (Recovery Time Objective).  Do you just nod your head like you know what he/she is talking about?  Maybe that scenario just happened and you are searching the internet for what these terms mean.  If so, welcome!  No one likes to think about disasters, but they happen all too often.  Planning for the worst and hoping for the best will keep your data safe and your job even safer. Let’s take some time and explore what RPO and RTO mean, why these things are important, and what you need to do next to be a hero DBA!

RTO (Recovery Time Objective)

     Recovery Time Objective is the amount of time in which your company expects you to have the database fully restored after a disaster.  That is, how much downtime is acceptable for disaster recovery or planned outages.  Each company is different, and most reference RTO in terms of nines. 

     The nines measure for a company that measures 365 days a year, 24 hours a day as follows:

5 9’s – 99.999% (this translates to about 5 minutes of acceptable downtime per year)
4 9’s – 99.99% (this translates at about 52.5 minutes per year and is much easier to achieve)
3 9’s – 99.9%  (this translates at about 8.75 hours per year)
2 9’s – 99% (translates to about 3.5 days a year)

     To decide what RTO is best for your company, you need to take into consideration your data needs.  Not all companies run on a 365/24 schedule.  Some companies only measure downtime between 8am-6pm Monday through Friday, or only on the weekends.   This will drastically change the translation of the 9’s.  Another thing to think about is whether the measured downtown includes time for maintenance or patching, times when the database must be offline.  If maintenance time is eliminated from consideration, meeting the higher 9’s is much easier.

     If your company insists on an RTO of 5 Nines and does not take into consideration maintenance or patching, then you must speak with the persons in charge to discuss the RPO.  It is possible to adhere to the strict 5 minutes of downtime, but the point at which you are able to recover, will definitely be restricted.

RPO (Recovery Point Objective)

     Recovery Point Objective is the level of data or work that is acceptable to lose in the event of a disaster.   Ideally, companies will want ZERO data or work loss.  While that IS achievable, it will all depend on  valid backups and the extent of damage the database suffered at the point of disaster.  

     An RPO of 15 minutes means that the data and work must be recoverable to a point within 15 minutes of the disaster, meaning that it is expected that only 15 minutes of work or data may be lost.  Stop right here and think about your backup  plans and recovery models.  Restoring a database that is in simple recovery model should not take as long as a restoring one in full recovery model. It is important to remember (from previous blog posts), the recovery model dictates how much data you can recover. It is also important to remember, the ability to recover ANY data at all is fully dependent on having valid backups.

Run Book

     Another term you might hear is “Run Book”.  A Run Book is a physical or digital collection of information that is needed to restart the database in case of disaster.  There are many items that should be included in the runbook.  Some of the essential items one should consider having in the runbook are:

  • Server level info, configuration, purpose, etc.
  • List of all databases and applications using them
  • List of agent jobs and proper response to a failure
  • Disaster Recovery process with all contacts, RPO/RTO, etc. required to bring it back (based on level of issue)
  • Security
  • Backup schedules

     When considering a run book, think about what someone would need if they were new to the company and the only person available to restart the database.  What information would that person need?  Making sure your run book is up to date on a regular basis is certainly a great idea!

Preparing for disaster

     Keep in mind that if you prepare for the worst, you will be less likely to be caught off-guard with a manager breathing down your neck asking “WHEN WILL WE BE BACK UP AND RUNNING?!?!”  Do you have any idea how long it will take to restore your database?  If your answer is “no,” I would suggest doing a restore to see how long this takes.  Further, I would suggest making it a habit to perform drills so that you and your team know what to do in the event of a disaster, and exactly how long it takes to get your company back up and running. Having a solid backup schedule, validating those backups, and keeping your company’s expectations in mind, you will be ready to handle any data disaster that may be thrown your way.  

Corruption.  We know it is everywhere.  It is surely a hot-button issue in the news.  If you haven’t given much thought to database integrity, now is the time to sit up and pay attention.  Corruption can occur at any time.  Most of the time database corruption is caused by a hardware issue.  No matter the reason, being proactive on database integrity will ensure your spot as a hero DBA in the face of corruption.

“A single lie destroys a whole reputation of integrity” – Baltasar Gracian

Integrity is one of those words people throw around quite often these days.  The definition of ‘integrity’ is the quality of being honest and having strong moral principles.  How can data have strong moral principles?  Does ‘data Integrity’ mean something different?  Yes, data integrity refers to the accuracy and consistency of stored data. 

Have Backups, Will Travel

When was the last time an integrity check was run on your database?  If you are sitting there scratching your brain trying to find the answer to that question, you may have serious issues with your database and not know it.

“But, I have solid backup plans in place, so this means I am okay.  Right?”

While having a solid backup and recovery plan in place is an absolute must, you may just have solid backups of corrupt data.  Regular integrity checks will test the allocation and structural integrity of the objects in the database.  This can test a single database, multiple databases (does not determine the consistency of one database to another), and even database indexes.  Integrity checks are very important to the health of your database and can be automated.  It is suggested to run the integrity check as often as your full backups are run. 

As discussed in an earlier blog, Validating SQL Server Backups, your data validation needs to take place BEFORE the backups are taken.  A best practice is to run a DBCC CHECKDB on your data to check for potential corruption.  Running CHECKDB regularly against your production databases will detect corruption quickly.  Thus providing a better chance to recover valid data from a backup, or being able to repair the corruption. CHECKDB will check the logical and physical integrity of the database by running these three primary checks*:

  • CHECKALLOC – checks the consistency of the database;
  • CHECKTABLE – checks the pages and structures of the table or indexed view; and
  • CHECKCATALOG – checks catalog consistency. 

Where to Look

Wondering if you have missing integrity checks or if they have ever been performed on your database?  The following T-SQL script will show when/if integrity checks were performed on your databases.  (Bonus) Running this script regularly will help track down missing integrity checks.

If you are looking for the last date the DBCC checks ran, the T-SQL script to use is as follows:


IF OBJECT_ID('tempdb..#DBCCs') IS NOT NULL
DROP TABLE #DBCCs;
CREATE TABLE #DBCCs
(
ID INT IDENTITY(1, 1)
PRIMARY KEY ,
ParentObject VARCHAR(255) ,
Object VARCHAR(255) ,
Field VARCHAR(255) ,
Value VARCHAR(255) ,
DbName NVARCHAR(128) NULL
)

/*Check for the last good DBCC CHECKDB date */
BEGIN
EXEC sp_MSforeachdb N'USE [?];
INSERT #DBCCs
(ParentObject,
Object,
Field,
Value)
EXEC (''DBCC DBInfo() With TableResults, NO_INFOMSGS'');
UPDATE #DBCCs SET DbName = N''?'' WHERE DbName IS NULL;';

WITH DB2
AS ( SELECT DISTINCT
Field ,
Value ,
DbName
FROM #DBCCs
WHERE Field = 'dbi_dbccLastKnownGood'
)
SELECT @@servername AS Instance ,
DB2.DbName AS DatabaseName ,
CONVERT(DATETIME, DB2.Value, 121) AS DateOfIntegrityCheck
FROM DB2
WHERE DB2.DbName NOT IN ( 'tempdb' )
END

The result will look similar to this.  However, let’s hope your results show a date closer to today’s date than my own!  If you see that your databases do not have integrity checks in place, check your backup and recovery plans and double check your agent jobs to see if perhaps the checks were scheduled but were turned off.

Exact Date Last Integrity Check

Recommendations

It is recommended that DBCC CHECKDB is run against all production databases on a regular schedule.  The best practice is to have this automated and scheduled as a SQL Agent job to run as a regular part of maintenance.  More specifically, to run the integrity check directly before purging any full backups.  Doing so will ensure that corruption is detected quickly, which will give you a much better chance to recover from your backup or being able to repair the corruption.

Remember, SQL Server is very forgiving and will back up a corrupt database!  Corrupt databases are recoverable, but they might have data pages that are totally worthless!