Special report:Can Windows and Linux peacefully co-exist?
Fourth in a series.
Unless you've been living under a rock, you probably know that for several years, Microsoft has been trying to convince customers that the Windows platform is far superior to Linux. Although I have always personally liked Microsoft products, I began to wonder how Windows stacks up against Linux when it comes to high performance computing, specifically clustering.
Who uses Windows clusters?
As I started researching corporate cluster usage on the Internet, I found that Windows cluster usage is indeed widespread. Even so, it seems that there are more companies using Linux clusters at this point. I couldn't find any statistics on the percentage of market saturation for either platform's clustering solution, but informal research revealed that at least two companies use Linux clustering products for every one that uses Microsoft.
Why choose one platform over the other?
There are lots of performance comparisons available for download on the Internet, but nearly all have contradictory results. I believe that these results vary depending on the hardware being used and the tasks being performed. I don't think that either platform is clearly superior to the other. Your organization's cluster platform selection should be based on your environment and on the task at hand.
Microsoft's clustering architectures
Windows Server 2003 actually supports two different types of clustering. One is called network load balancing, which enables up to 32 clustered servers to run a high-demand application to prevent a single server from being bogged down. If one of the servers in the cluster fails, then the other servers instantly pick up the slack.
Network load balancing has been most often used with Web servers, which tend to use fairly static code and require little data replication. If a clustered
The other type of clustering that Windows Server 2003 supports by default is often referred to simply as clustering. The idea behind this type of clustering is that two or more servers share a common hard disk. All of the servers in the cluster run the same application and reference the same data on the same disk. Only one of the servers actually does the work. The other servers constantly check to make sure that the primary server is online. If the primary server does not respond, then the secondary server takes over.
This type of clustering doesn't really give you any kind of performance gain. Instead, it gives you fault tolerance and enables you to perform rolling upgrades. (A server can be taken offline for upgrade without disrupting users.) In Windows 2000 Advanced Server, only two servers could be clustered together in this way (four servers in Windows 2000 Datacenter Edition). In Windows Server 2003, though, the limit has been raised to eight servers. Microsoft offers this as a solution to long-distance fault tolerance when used in conjunction with the iSCSI protocol (SCSI over IP).
So, what would make someone choose to use a Microsoft cluster? I recently spoke with several friends (who do not work for Microsoft) and asked them why they chose Microsoft clusters. I got a variety of answers. One person told me that his company had subscribed to one of Microsoft's volume licensing agreements, and going with Microsoft just seemed like the thing to do, since everything else running in the organization was based on Microsoft.
Another person told me that his corporate Web site was running on Microsoft's Internet Information Server (IIS). Since the site was coded in Active Server Pages, IIS was the only platform that could natively run the site. As the site grew, the company had no choice but to create a Microsoft-based cluster.
A third person I spoke with explained that his company had originally considered implementing a Linux-based cluster for a particular database application. The company preferred to use Microsoft products because of the level of available support, but the Microsoft platform required a separate Windows Server 2003 license for each cluster node and thousands of dollars worth of special hardware. The company was willing to spend the bucks, but its IT policy stated that a duplicate machine must be purchased to match any server put in a production environment. The duplicate machine is used in the company's lab for deployment testing. While the company was willing to spend money on a Microsoft server cluster, a duplicate system for the lab was beyond the budget. The problem was eventually solved by bending the rules a little and using VMware to create a test cluster environment on a series of virtual servers.
Linux and Beowulf
As you can see in the section above, price was the major objection to deploying a Microsoft-based cluster. Admittedly, it's hard to justify spending that kind of money for a Microsoft cluster when you can get arguably better performance on a Linux cluster for a fraction of the cost.
Price isn't the only argument for choosing Linux, though. Certain Linux clustering implementations can scale way beyond anything that Microsoft offers. As I explained earlier, Microsoft imposes an eight-server limit for a cluster and a 32-cluster limit for load balancing. In comparison, two years ago, Charles Schwab was using a cluster of 50 Linux servers to perform a financial analysis. According to a company spokesman, it was able to achieve performance similar to that of a supercomputer, but for far less money.
At the moment, the prevalent type of Linux cluster is called Beowulf. Similar to Microsoft's network load balancing solution, Beowulf relies on parallel processing. Beowulf does have its differences, though.
As you will recall, Microsoft's network load balancing allows the cluster to run multiple instances of a common application. In contrast, Beowulf works best when each node in the cluster is running completely independent code rather than parallel code. In Microsoft's network load balancing, the main goal is to increase scalability so that the cluster can service more people than a single server ever could. A Beowulf cluster is better suited to mathematically intensive operations in which the processing power of multiple servers can be used to arrive at a solution faster than a single server could.
As you can see, the superiority of either platform cluster is debatable. What is clear is that you will get the best results if you choose the cluster platform best suited to the task at hand.
Brien M. Posey, MCSE, is a Microsoft Most Valuable Professional for his work with Windows 2000 Server and IIS. He has served as the CIO for a nationwide chain of hospitals and was once in charge of IT security for Fort Knox. As a freelance technical writer he has written for Microsoft, CNET, ZDNet, TechTarget, MSD2D, Relevant Technologies and other technology companies.