Previously, we discussed the critical reasons for making operations part of your disaster recovery plan, and I related the three key steps to achieving that goal: (1) Establish your universe, (2) Create your plan, and (3) Integrate it into your daily work.
Create system documentation
After you've defined your universe, the next step in the process is to create system documentation. The reason I say create and not find existing system documentation is that most system documentation has been created by a third-party who took their best guess at documentation. And, most likely, it does not reflect your company's current configuration.
Here are some quick ways to build or rebuild your documentation:
Centralize: Create a "System" folder: Identify a shared directory and a hard copy binder as your starting point. Begin with an empty directory and binder so you don't have information that is out of date and only put current documentation that you have reviewed here. To determine what should be included here, answer this question: "If I hire a new employee, what could I give him/her to read during their first week to familiarize them with our system?" Pictures are worth a 1,000 words: When reviewing highly complex systems of Fortune 500 companies, or trying to keep up with all the moving parts of a startup, I have found that drawing a picture is usually the best place to start. Using all the correct data flow symbols is not as important as drawing the picture. Print out your picture and keep a pen nearby and as you gather each piece of information, take a few moments and draw in the new table, system or interface. You can also offer incentives to your team for submitting drawings to help speed up the process. Identify your disasters: As you build your system documentation, you can now review it to find your company's weaknesses. Take individual documents and identify any points of failure -- do not ignore items you are already prepared for. The point of this exercise is to determine all possible disasters that might occur. The list could contain anything from a single point of hardware failure to insufficient cross training to support a combination of a down system and a missing employee or when your Active Directory administrator accidentally deletes the OU. Identify your controls: Take the list of disasters, and add a column that you will use to complete your controls. In this column, answer such questions as: If the Exchange cluster had a storage failure, could we fall back to our hourly snapshots of the database that are stored on a separate server? After you complete this exercise, you will be left with a list of disasters for which you don't have an answer. And, you'll find some answers you gave that you don't much like. Risk analysis: Review the list of unanswered questions or those that weren't answered to your satisfaction and prioritize them according to the most likely and most devastating disasters. Now that you have prioritized risks, you can create a plan to mitigate the identified risks.
Integrate with your daily work
At this point, you are most likely thinking, "If I didn't have to do my job, I am sure I could find time to do all of this." The biggest obstacle to reducing your risk is execution of the above mentioned steps. The following activities will help integrate disaster planning into your daily workflow:
While I have never had a fire in the data center that destroyed all of my servers, I have dealt with several disasters relating to upgrades that failed to properly account for all the system dependencies. Although my team was not as prepared as it should have been, having a clear understanding of the high-level system relationships allowed me to focus the efforts of my team to the appropriate areas to work through the details.
Sell: Job number one is to convince yourself that this is important. If you are not convinced that allotting time to these activities will prove beneficial, the rewards will never be realized. You will have to continue to sell yourself on the idea, so write down your goals and why you want to achieve them -- that way the sell will become easier when priorities start to shift. Set aside time: Schedule a time every week that you dedicate to the development of your plan. Stick to the fundamentals: Don't make a simple analysis complex. Build your plan one system and one data flow at a time. Have the spirit of your disaster recovery plan with you at all times: Carry around thoughts like "If something bad happens, what is the plan of action; what systems will this affect; and who knows how to fix it." That will help address risks early and make your planning process much easier.
I have found that execution of these steps not only addresses disaster recovery risks, but also has made me adept at managing my universe. Change management, product development and integration, and process improvements are much easier if you have good answers to your disaster recovery plan analysis. As you build up your knowledge of the inputs and outputs of your universe, you will be better prepared to make strategic decisions.
Russell Olsen is the CIO of a Medical Data Mining company. He previously worked for a Big Four accounting firm. He co-authored the research paper, "A comparison of Windows 2000 and Red Hat as network service providers." Russell is an MCP and GSNA. He can be reached at email@example.com.