Most system outages are minor. But when a serious or unexpected incident occurs, business continuity documentation can help minimize mistakes, reduce costs and save time.
The best approach for developing business continuity documentation is to make it part of the system development lifecycle. A company must evaluate its documentation during disaster recovery testing and revise it when developers make major application or technical changes to ensure it is accurate and up to date.
For companies that lack comprehensive documentation, a one-time effort is required to create this valuable repository of disaster recovery information. Creating the documentation can be a secondary project to accumulate all the necessary information over time.
Then as system changes take place, or the needs of the enterprise change, the documentation can be revised as part of the change management process.
Storing business continuity documentation
Once business continuity documentation for each critical application is created, administrators should store it in an electronic format. I use dedicated SharePoint documentation libraries because they offer security, check-out controls and autonomy-level approvals. Other storage options include using an intranet site or libraries on network drives.
An important design guideline is to avoid duplicating pre-existing documentation. If the document is maintained in two places, eventually it will become unsynchronized. Two different copies will always compound recovery efforts. A better approach is to capture file names or SharePoint library locations where existing documentation resides and reference it as needed.
When a data center is affected by a disaster, only a small percentage of the staff will travel to the recovery site. Because it's impractical to send subject matter experts off-site, the business continuity document should include contact information so recovery team members know who and how to contact the experts if they are not familiar with the applications.
Creating a disaster recovery template
To speed up and improve recovery efforts, enterprises should create a template that can serve as a guideline for the recovery team as they work to restore a critical business application. The following explains what sections and information should be included in the DR template:
Application system overview -- This section captures the importance of the application and its recovery time frame in relation to other systems. It also records the high-level recovery steps in summary form. Here you will include:
- Application system name
- Recovery ranking: High = 1 day, Medium = 2-3 days, Low = 4+ days
- Number of days users can work without the system
- Brief overview of recovery requirements
Documentation change control log -- Maintain audit trails and timestamps as changes are made to documentation. Include the following information:
- Date created/created by
- Date last updated/updated by
- Date last reviewed/reviewed by
- Date last tested/tested by
Business continuity contacts – Maintain information about all business and technical principals that might be involved in the recovery process, including:
- Disaster recovery coordinator for application
- Application systems manager
- Development team
- DBA and technical contacts
- System owner and executive owner
Contact information for each vendor -- A DR incident may require assistance from software providers to activate products, grant licenses or provide support. Capture the following information in a standard area:
- Software or hardware product
- Vendor name and website
- Mailing address
- Account representative
- Vendor support contact
- Website technical support
- Company account number
- Product license key
Technical library documentation locations -- Point to existing online documentation to help recovery team members who may not be familiar with the application. Include the following information.
- System narrative
- I /O interfaces
- Systems flowchart
- Network/workflow diagrams
- Application source code libraries
- Key input files, screens, reports and output files
Technical recovery information -- The technical framework is described in case servers, workstations, communication networks or other resources that need to be created from scratch. Here is some of the information to include:
- Mainframe requirements
- Online and batch processing requirements
- Server hardware/server OS requirements
- Server mirroring considerations
- Network topology/network communication requirements
- Web server requirements
- Client hardware and client O/S requirements
- Client software and support software requirements
Application security considerations -- Security is often a complex requirement in the recovery process, and it should be well documented. Consider the following areas:
- Windows domain
- Other network security environments
- Firewall and A/V
- Internal and external user groups
- Network and application security
Post-recovery considerations -– For major outages where off-site recovery is required, planning must take place for moving the application back to the recovered data center.
Every minute of downtime represents lost business opportunities. Good documentation helps companies beyond just application recovery efforts. It's beneficial for training and day-to-day support and it's useful when consultants help with special projects. Testing and refining the documentation are always valuable improvement steps.
Although major disasters are thankfully few and far between, they can occur at any moment. For example, in one of my former companies, one of our most important offices was one block from the World Trade Center. After Sept. 11, we had to use our business continuity plan for more than two months.
Finally, it's important to "just do it." Business continuity projection is analogous to an insurance policy that provides protection we hope to never use. However, when it's needed, everyone benefits directly in terms of costs, time and quality from this planning effort.
ABOUT THE AUTHOR
Harry L. Waldron has more than 35 years of experience in the IT profession. A Microsoft MVP, he works as a senior developer for Parsippany, N.J.-based Fairfax Information Technology Services where he provides technical, business and leadership support on key development projects. He writes about security and best practices for several technical forums, including myITforum.com.
This was first published in December 2008