The metadata features of SharePoint 2003 and 2007 enable you to customize, sort, group, filter and search documents and lists. In SharePoint, this equates to adding columns that describe and categorize list items. SharePoint 2007 takes a more disciplined content management approach with the introduction of site columns and content types.
Although powerful, there is an integrity flaw in both SharePoint 2003 and 2007 that you should be aware of. When you add a choice or multi-choice column, you can supply a list of values. If you delete or modify values in this list, any list items using the old values will not be updated and will still contain the original values. Those of you with database design experience can see the issue immediately: SharePoint does not maintain referential integrity. The list item column references a value in a lookup list which may change.
I'll demonstrate the issue.
Go to a document library and select Modify settings and columns.
Click on Add a new column.
Name the column "Color" and enter 3 choices "White, Brown and Balck." Notice that I have misspelled "Black." Although this is a simple example, errors like this occur in the "real world". In addition, values can and do change. Imagine you uploaded 100 documents using a given company name and the company goes through a name change. Without a utility, your only option is to modify the column value and update each documents by hand. From experience, SharePoint content managers usually just update the column list value without realizing existing list items contain the old value.
After adding the column, add a new document to the list. When prompted, select the value "Balck" for the color.
You can see the list item contains the misspelled value.
Go back to the document library and edit the new column you added. Change the value "Balck" to "Black."
Even though you changed the value, the list item still has the old value.
If you edit the list item, notice the color drop down reverts to the default value because it was unable to find "Balck" in the list. Until the list item is updated, it will contain an invalid value.
How do you deal with this issue? First, make sure your content managers are aware of the issue. From now on, use a utility when you need modify or remove list item values that are in use. You can do it by hand if there are a small number of list items.
Once the issue is under control, analyze and fix integrity issues. I developed a sample in C# that will analyze a SharePoint 2003 environment. In a later post, I will update it for 2007.
The code starts at a given url and recursively analyzes each list and list item verifying each column contains only valid values. The end result is a list of invalid list items that you could output to a log file or run nightly and send e-mail notifications when issues are encountered. This code is not comprehensive, it is only a sample to get you started.
If want to see if you have any metatdata integrity issues in your environment, copy the .exe in bindebug directory in the attached zip file to your SharePoint server and run it. After it runs, open the log file and see how many issues you have.
Download the full source code here.