Disaster recovery testing is an investment that every company should make to protect its information and prevent lost business opportunities. Most companies do a good job of backing up their information, but many don't take the next step of adequately testing out the recovery process.
Periodic testing of recovery procedures will add more expense to the IT budget. This may include the cost of spare servers, travel to cold site centers, special communication lines and personnel costs. But, when prepared companies face actual recovery challenges, they will be better equipped for a more efficient, expedient and accurate restoration of services.
The severity rather than the frequency of loss is what can be used to justify the additional expenses associated with disaster recovery planning and testing. In a worst-case scenario, information critical to the business may be permanently lost.
On the surface, a Windows application system might seem easy to recover. It typically would reside on a single server and should work OK once the Windows infrastructure is restored. However, there are often secondary applications and special business reference files that interface with the primary business system. Recovery testing is the only way to ensure an application and its associated information can be restored accurately if a true disaster were to strike.
In almost every test, the disaster recovery team finds some area either large or small that it could improve upon. If a company neglects testing, that could significantly delay the recovery process as the team tries to learn how to piece together its Windows environment and application in a highly stressful environment.
Disaster recovery testing checklist:
Here are some ways that Windows managers can implement and test their disaster recovery plans:
- Gain the support of top management for the entire business continuity process. Work with business areas to rank and prioritize applications, so the most critical systems are restored first. Explore the costs related to lost business opportunities and the length of time each application system can stay off line.
- Set an annual date for reviewing and updating the disaster recovery plan. Set another date for updating and testing during times of the year that are the least busy.
- Obtain specific training for how to recover the Windows infrastructure, so that AD, DHCP, WINS, DNS and other controllers will be brought back accurately and reliably. Even more training may be needed for protecting information that resides in NAS or SAN repositories.
- Allow plenty to time to work out production problems, as resources permit. For example, if testing a portion of the plan might take a week, then allow a month of calendar time. That way, administrators can add in the time it takes to test with their normal production support.
- Don't try to test everything at once. Select just one critical application or the Windows network environment itself as a starting point. Then, over time, select other applications perhaps on a quarterly or semi-annual basis. As part of the testing process, ensure blade, virtual or clustered Windows servers can be accurately recovered.
- Use principles of continuous improvement based on testing feedback to ensure that plans stay up to date.
- Integrate disaster recovery into the change management process. That way, when new systems are implemented, they are also protected.
Disaster recovery planning and testing must be an integral part of the IT management process. When this process is neglected, it's like a person who does not protect himself with an insurance policy. Sometimes, through good fortune, the person may avoid problems for a long time. Still, the likelihood that disaster will strike can catch up with anyone who tries to avoid it.
If you have to postpone a testing period because of special project needs, that may be acceptable. However, don't put planning and testing efforts completely on the back burner.
There are benefits in proactively assessing the risks, taking precautions and investing small amounts of money in protection over time. If the continuity process receives the proper attention, a protected business will be glad it paid a small price today to avoid a much greater expense in the future.
Windows-based applications are more difficult to recover than just restoring a server from backup tapes. The domain, user accounts, security and network must be reestablished with precision in order to recover the application. To ensure the process will work when needed, thorough disaster recovery testing is your best insurance policy.
Harry L. Waldron has more than 35 years of experience in the IT profession. A Microsoft MVP, he works as a senior developer for Parsippany, N.J.-based Fairfax Information Technology Services where he provides technical, business and leadership support on key development projects. He writes about security and best practices for several technical forums, including myITforum.com.
This was first published in February 2008