Let’s begin by examining specific types of problem content. By “problem,” I mean that these forms of content can bloat the SharePoint database -- perhaps not reducing performance, but definitely increasing your storage costs and making tasks like backup and recovery more complicated. We’re not going to take the “easy” route and simply say, “don’t store this information in SharePoint;” our goal is to use SharePoint the way it’s meant to be used -- but to do so with a bit more control over our storage utilization.
Large content items
First and foremost are the file attachments stored in SharePoint, which I’ll refer to as large content items. Word documents, PowerPoint presentations, Excel spreadsheets, Photoshop illustrations, Acrobat PDFs, you name it. Traditionally, we would just have dumped these onto a file server and let people access them from there, but with a file server, we’re not getting integrated enterprise-wide searching, nor are we getting version control -- which could certainly be beneficial for at least some of the files in your environment.
SharePoint offers those features, but these large items can take up a lot of room in the database, increasing your storage costs. In addition, as I’ve already mentioned, SQL Server isn’t necessarily at its best when working with these large content items; if there was a way to move them outside the database -- and still have them be “inside” SharePoint, of course -- then we could perhaps improve performance a bit as well as optimize our storage.
Shared folders and media files
Obviously, the information in shared folders qualifies as “large content items,” so all the caveats I described in the previous section still apply. Media files -- audio and video files -- obviously fall under the same category, as video files in particular can be very large.
But they have some unique problems above and beyond their mere size. Simply getting this content into SharePoint can present an enormous challenge: You need to locate the data, copy it into the SharePoint database, create the necessary SharePoint items to provide access to the data, and -- perhaps most importantly -- apply the appropriate permissions to the content so that SharePoint’s access permissions reflect the original permissions of each file.
You’ll be adding considerable size to your SharePoint database in the process, of course, but you’ll get the advantages of SharePoint’s features, including permissions management, workflows, alerts, and versioning, along with indexing and search. Figure 1.4 illustrates the logical migration process.
There are a number of vendors who offer tools to assist with, and automate, this kind of data migration. However, be aware that this kind of migration isn’t always the optimal way to use SharePoint, at least in terms of storage optimization.
Figure 1.4: Migrating content into SharePoint.
Notice that the source repository for these migrations can come in a number of forms: typical Windows file servers, of course, but also cloud-based storage or even FTP servers. The basic idea is that any file, no matter where it’s located, can become more valuable and collaborative once it’s inside SharePoint -- assuming, of course, that you want to devote enough storage to keeping it all in the repository, or that you have another way of incorporating the information without actually migrating it into the database.
Why in the world would we want to include FTP- or cloud-based content in our SharePoint infrastructure? Simple: There are a number of good business reasons to include the “primary copy” of content in a cloud-based storage system, on an FTP server, or elsewhere. Recoverability is one reason: Cloud-based storage can offer better protection against deletion or failure.
Accessibility is another reason: We might have need for others to access the data, and cloud- or FTP-based storage both offer easy ways for anyone in the world to get at the information.
Sometimes data in a cloud- or FTP-based storage system might be someone else’s data that our company has access to; being able to include that in SharePoint would make it easier for our users to access, without requiring us to actually “own” the data.
So there are definitely situations where we would want to bring in content from a cloud-based storage system, or even an FTP server, without actually “migrating” that data to live entirely within SharePoint. This may be a tricky requirement, as most of SharePoint’s features typically require content to “live” in the database, but by identifying this as a potential need, we can be on the lookout for a solution, technology, or trick that might let us meet that need.
Aside from the storage implications, there might seem to be one other significant downside of moving content into SharePoint: retraining your users. For years, you’ve taught them to used mapped drives, or possibly even UNC paths, to get to their shared files. Now, they have to learn to find their files inside SharePoint document libraries. Newer versions of Office can help alleviate the retraining pain because users can directly access documents from those libraries, but for non-Office files -- or if your users are used to older versions of Office -- there’s still some retraining to be done. There’s good news, though: Usually, retrained users have an easier time working with documents that are in SharePoint, so there’s definitely a benefit associated with that retraining investment.
I want to spend a few moments discussing the specific challenges associated with streaming media -- meaning audio and video. First, these files tend to be large, meaning they’ll take up a lot of space in SQL Server and place a greater demand on SQL Server to retrieve them from the database. They can also place burdens on SharePoint’s Web Front End (WFE) servers, because those Web servers have to retrieve the content from the database and stream it -- in a continual, literal stream of data -- to users. In fact, this kind of media content is the one thing I often see companies excluding from SharePoint, simply out of concern for what it will do to SharePoint’s performance. This book will have a specific goal of addressing this kind of content, and identifying ways to include it in SharePoint without creating a significant database or WFE impact.
Dormant or archived content
Perhaps one of the biggest drains on your SharePoint storage is old content that’s no longer needed for day-to-day use or that hasn’t been used in a significant period of time but still can’t be permanently deleted. Most organizations have a certain amount of data that qualifies as “dormant” or “archival,” such is particularly the case for organizations that have a legal or industry requirement to retain data for a certain period of time.
Even if all you have in the way of shared data is file servers, you probably know that the majority of the files they store isn’t used very frequently. Think about it: If your SharePoint servers only needed to contain the data that people actually accessed on a regular basis, the database probably wouldn’t be all that large. The problem is that you also need to maintain a way to access all that dormant and archival data -- and that is often where SharePoint’s biggest share of storage utilization comes from, especially when that dormant or archived data consists of large content items like file attachments. It’d be great to pull that information out of SharePoint, but then it would no longer be indexed and searchable, and when someone did need to access it, they’d have no version control, no alerts, no workflow, and so forth.
I’ve seen organizations create tiered SharePoint libraries, like the one Figure 1.5 shows. The idea is that “current” content lives in a “production” SharePoint server, with its own database. Older or dormant content is moved -- either manually or through some kind of automated process -- into an “archival” SharePoint installation, with its own database. The archival database isn’t backed up as frequently, may live on older, slower computers, and in general costs slightly less.
Figure 1.5: Tiered SharePoint storage.
Properly done, you can even maintain a single set of search indexes so that users can find older content. The problem is that older content becomes second-class, might be harder to get to in terms of performance, and still takes up space in a SQL Server database. This type of tiered storage isn’t necessarily ideal for every company, although it’s on the right track toward a better solution.
There’s a bit more to this dormant/archival content picture, and that’s how you actually identify dormant or archival content and move it out of SharePoint -- while somehow leaving it “in” SharePoint so that it’s still searchable and accessible. Let’s face it: If you expect users, or even administrators, to manually identify “old” content and mark it for archival in some fashion, it’s pretty much never going to happen. So you need to create some kind of automated, non-manual process that can identify content that hasn’t been accessed in a while, apply customizable business rules, and automatically migrate content into some other storage tier -- without “removing” it from SharePoint, of course.
“Dormant” content can consist of a lot more than the odd rarely-used file. In fact, if you’ve really been using SharePoint, you might have entire sites that are dormant -- perhaps ones associated with a now-completed project -- and you want to dismantle them without making them permanently unavailable. You might want to treat old versions of files as “dormant,” while leaving the current and most-recent versions in your “production” SharePoint site -- but you don’t want to permanently delete those old versions. You might even be required to maintain older content, for regulatory reasons, but you don’t see any reason to bog down your day-to-day SharePoint operations to do so. There are lots of reasons to want to tier your SharePoint storage, and we’re going to need to investigate some of the methods that will let you do so.
|This chapter is an excerpt from the book, Intelligently Reducing SharePoint Costs Through Storage Optimization, authored by Don Jones, and published by Realtime Publishers, November 2010, ISBN 978-1-935581-25-3, Copyright 2010 by Realtime Publishers. Download the complete book for free at Realtime Nexus Digital Library.|
This was first published in March 2011