As SharePoint becomes a more critical component of your infrastructure, it also becomes a service that needs to be more closely monitored to ensure its availability. But many organizations mistakenly try to treat SharePoint like other application servers, such as SQL Server. The reality is that SharePoint can be incredibly more complicated.
The most common misconception is that it's SharePoint that you have to monitor. This false belief will lead most operations personnel to overlook the fact that SharePoint is really a combination of services that act together. Because these services are tightly integrated, your monitoring strategy needs to be multidimensional.
So what do you monitor? Start with SharePoint's core dependencies.
SharePoint is dependent on services that provide different functions to the core SharePoint services. If any of these core dependencies fails, it will cause SharePoint to cease working and seriously degrade your SharePoint farm's ability to service clients. Above all, you need to ensure that all of the following services are available and that SharePoint is able to leverage them:
Without Windows, SharePoint wouldn't work at all. So that means you must be vigilant and monitor all event logs to ensure that the OS is performing as expected. Event logs can be a great indicator that patches haven't been applied because errors in the event log will often point to issues with your SharePoint infrastructure.
Whether you use forms-based authentication or Active Directory for authenticating users, Active Directory is a requirement for server-to-server communication. SharePoint uses several service accounts and needs to communicate with AD to validate the credentials it uses for those accounts. The critical accounts include the SharePoint service account and the search account. SharePoint applications may also have unique accounts.
If SharePoint is unable to authenticate these accounts, the farm will come screeching to a halt pretty quickly. If you use AD for your authentication provider, it also becomes a critical source of profile data and will leverage the search account to periodically gather that data, adding it to SharePoint's profile store.
Everything in SharePoint is stored in a SQL database. As a result, if SQL Server ceases to function, SharePoint will quickly follow. So you must ensure that SQL Server is not only functional but that the SharePoint Web Front End can communicate with it as well. Make sure the service accounts that SharePoint uses have the appropriate access to the various databases—primarily the SharePoint_Config database and secondarily the content databases associated with each Web application.
Internet Information Server (IIS)
SharePoint is a Web-based application server. It relies on the IIS configuration and on the service being fully operational. Depending on the complexity of your SharePoint environment, IIS will be configured with one or more application pools and one or more IIS websites.
There may not be a one-to-one relationship between an application pool and website, but central administration and the Shared Services Provider will probably have dedicated application pools—or at least they should in a production environment. Other Web applications, like your main portal site or your My Sites application, may or may not have dedicated application pools. This is especially true if you have "extended" Web applications.
Take care to map out the relationships and the corresponding application pool identities. Keeping tabs on their status and response to requests will be key to ensuring SharePoint is available to your end users.
Beyond dependent services, there are a number of direct SharePoint components that are worth monitoring. Like dependent services, all of the following are more or less required for SharePoint's operation:
SharePoint's Timer Service
The "automatic" things that SharePoint does really aren't all that automatic. In fact, many of the processes that occur in a SharePoint environment are the result of the Timer Service and the related timer jobs defined in central administration.
The Timer Service is a true Windows service that is responsible for initiating this job. It generally runs every five minutes, but it could shut down or become inoperable for a few different reasons. Most often it's the result of some authentication problem with the identity it uses. This is a service that you should monitor. If it's not running, it's not likely to be noticed immediately, but when users don't get their alerts, you'll get calls.
Even with WSS, SharePoint has a search service running. The WSS search and the add-on MOSS search are used for indexing content as well as importing profile data. If these services are inoperable or inconsistently available, you'll almost immediately begin seeing issues with your farm, ranging from event log entries to inoperable search to your profile imports failing.
If you want to keep your users happy and SharePoint humming along, create a program for monitoring the items identified here. What hasn't been addressed are applications built on SharePoint or key add on tools like backup and recovery programs. All of these elements should also be monitored. Talk to you vendors to understand how these components can be included in your monitoring program.
SharePoint Content Databases
There's a known performance limit for SharePoint's content databases. In general, you shouldn't let these databases exceed 100 GB. Your site won't come crashing to a halt if you exceed that limit, but you will begin to notice performance degradation. Content databases are associated with an application, and you can have multiple data.
Shawn Shell is the founder of Consejo Inc., a consultancy based in Chicago that specializes in Web-based applications, employees and partner portals, as well as enterprise content management. He has spent more than 19 years in IT, with the last 10 focused on content technologies. Shell is a co-author of Microsoft Content Management Server 2002: A Complete Guide, published by Addison-Wesley, and the lead analyst/author on the CMSWatch SharePoint Report 2009.
This was first published in August 2009