Use existing projects
In every department there will always be initiatives that allow you to incorporate live disaster recovery plan testing into the project.
Let's say your company has recently acquired another company and you will be migrating their users and computers into an existing organizational unit (OU) in your domain.
If you plan the project with the appropriate rollback procedures, you could test an authoritative restore of the original OU to make sure your migration procedures are adequate. It might add a few days to your project, but performing a live test (that wouldn't impact operations) would be worth a few extra days.
The key to this method of disaster recovery testing is to take advantage of what you are already doing. By adding a step here and there, you can effectively test your procedures without costly offsite testing.
Learn from your mistakes
Too often we think that when performing an IT disaster recovery plan test, we must plan the test and control the input data in order to have confidence in the output. I have found that the best disaster recovery procedures are found in our daily mistakes.
Every day things go wrong -- a new domain administrator deletes an OU, the Microsoft SQL Server DBA forgets to back up the database schema (which you discover after a production database was dropped), faulty hardware causes your exchange server to go down, and so on.
Take the opportunity to do a post mortem after you experience these daily problems and ask the following questions:
Every organization and department will have additional questions to ask, but the idea is the same whether you manage the Active Directory group or the Microsoft SQL Server team: Get the most out of daily events. Lots of small disasters can equal one fatal error or it could be business as usual -- depending on how they are handled.
When the problem occurred, was the disaster recovery book used to help remedy the situation?If the answer is no, follow up with two additional questions: Was the resolution contained within the scope of the DRP? If not, evaluate how relevant your plan is. If it would only help out when there's a complete loss of all systems, it is probably so broad in scope that it won't get you the detailed information you need to handle minor disasters.
If yes, why wasn't it used? Your disaster recovery plan should be a familiar document to all members of the team. You never know who will be present to help recover after a disaster.
How was the problem identified?This is important because it helps you understand the formal and informal practices that your team uses to monitor operations. You can make sure that small disasters don't become big ones with appropriate monitoring controls. How long did it take to fully recover from the issue?The "how long" question extrapolates how much time it might take you to recover from a larger disaster and if it is an acceptable time frame. If it took three people two days working 16 hours to recover from an accidental deletion of some Active Directory Objects, you need to know if 96 FTE hours was too slow. How many hours would it take if multiple domain controllers at a single site were lost? Would that be acceptable?
The concept of IT disaster recovery plan testing can be a large production at an offsite location, but if you leave it as a once a year event, you'll miss the true concept. Testing daily IT events holds so much value because they offer what a large production can't -- the unexpected. Ask questions, involve the team through verbal testing, take advantage of existing events and evaluate unanticipated errors. You'll be making huge strides in your preparation for a larger disaster.
Russell Olsen is the CIO of a Medical Data Mining company. He previously worked for a Big Four accounting firm. He co-authored the research paper "A comparison of Windows 2000 and RedHat as network service providers." Russell is an MCP and GSNA. He can be reached at firstname.lastname@example.org.
This was first published in May 2007