The greatest threat to an organization's systems is bugs in commercial software. A common scenario might be a system crash/data loss due to the installation of a bad driver that unexpectedly renders your system unbootable. It's a lot more common to have problems because of software bugs than because of human error, viruses or natural disasters. Although, admittedly, all are risk factors at some level. Is the Windows platform (server or desktop) any better or worse in terms of backup and recovery than other platforms? What are some specific pitfalls Windows administrators should be aware of?
There really aren't any specific pitfalls for Windows houses. I think the Windows platform is on par with what you'll find in the Unix space. If you can find a type of backup and recovery solution for Unix, you're almost certainly going to be able to find it for Windows, and vice versa. Most of these big enterprise players play in both spaces. The RAID (redundant array of inexpensive disks) technology that Veritas provided in Win2k was originally developed for Unix. You'll find the exact same RAID solution in Unix and Win2k. That's just an example of how the big enterprise backup and recovery players are often addressing more than one space. So what can an organization do to minimize these risks? Is there such a thing as a foolproof disaster recovery plan?
I don't know if there really is such a thing as a foolproof disaster recovery plan. But I definitely believe that the most important elements of any backup and recovery plan are redundancy and mirroring -- copying your data and backing it up in more than one place. Mirroring and redundancy are your first and most important lines of defense. The problem is that redundant solutions are typically expensive and complicated. Can you talk a little bit more about the types of redundancy solutions available?
I recommend a RAID solution in any backup and recovery plan. There are a lot of RAID solutions out there. Windows NT and 2000 servers have RAID at the software level in the Disk Management utility. But if you can do it, I'd actually recommend implementing a hardware RAID solution. Why? What's better about a hardware RAID solution?
First of all, the more types of redundancy you have, the better protected you are. While a software RAID solution is a good idea in a lot of cases, your system won't perform as well as with a hardware solution. In a software RAID solution, the operating system is involved in creating the redundancy. The software is doing the work, so it affects performance. Hardware RAID solutions physically mirror through the use of a disk controller, not using system resources or affecting performance. This is a more expensive option, though.
There are also performance advantages from having more than one copy of your data. You can be reading different data from multiple disks at the same time. For example, if I wanted to read a file and only had one disk, reading that file might take 10 seconds. (I'm just throwing a hypothetical number out there.) But if I had a copy of that file on the disk and on the mirror, the first half of file could be read from first disk and second half from the mirror, breaking the retrieval down to 5 seconds. What are some software solutions?
Web sites might find Microsoft's Network Load Balancing (NLB) Server useful as a redundancy tool. With NLB, you can have up to 32 computers clustered, yet the outside world only sees one computer. Microsoft.com is a good example. They may have up to 32 computers that look like microsoft.com. That is, all these computers have a copy of the microsoft.com Web site on them and can respond to requests. In this type of clustering situation, up to 31 of those 32 computers could fail and you wouldn't lose your data.
Then there's Microsoft Cluster Server, a more general purpose clustering solution. This differs from NLB. Instead of multiple computers handling incoming requests, just one does. Let's say you've got two servers (you can have up to four servers with Datacenter Server). One is handling traffic to the database. Only if that first server fails does the second server start handling the traffic going to database.
I've also seen a third party software program that mirrors across a network, which is even more powerful than typical mirroring software. Instead of having two copies of Drive D on my computer, one copy of that data is going across the network to another computer. With a program like that, you get geographic redundancy. If your office or server catches on fire, you could potentially have this remote copy in a different building or city.
What can you recommend to people who can't afford expensive redundancy measures?
Ultimately, you need to protect your data so you can get it back to where you can use it again. Cost can't be an excuse. It needs to be done. That being said, you don't necessarily need to pay for those backups. Some backups are free. Windows NT and 2000 have free backup programs installed. Whether or not they're the best backup programs is questionable. There is better backup software in the marketplace, in my opinion. But if you can't afford it, you can still backup using the free options. There really is no excuse to not take this minimal measure towards protecting your data. Do you recommend using third-party software for monitoring systems to detect problems immediately when they arise?
Windows NT and 2000 don't come with their own monitoring software. You definitely need to go a third party for that. This software can range anywhere from $99 to $10, 000 though. How much you decide to spend on that software is your prerogative, but I do recommend using it -- absolutely. How can a systems administrator/IT manager convince higher-ups to invest as much as necessary?
The best way to convince them is to present them with the worst-case scenario of what could happen. I mean...it's insurance. Ask them: Do you have car insurance? If they can see the value in insuring a $20,000 or $40,000 dollar car, then I hope they can recognize the wisdom in having insurance on something as valuable as your company's computer systems.
To join Mark Russinovich for his live expert Q&A click here.