This is the first of two posts about developing custom cross-list search solutions using either MOSS or WSS 3.0. The two part series will compare and contrast developing cross-list search solutions with MOSS and WSS. This will include issues you should be aware of relating to metadata management, accuracy of results and UI presentation of fields to query against.
Introduction
Cross-List searching is not new. You were able to search across lists within a site in WSS 2.0 although it was slow and tedious to combine the results. WSS 3.0 offers the new GetSiteData method of the SPWeb class which takes the new SPSiteDataQuery class as an argument. This method will return all the results of a cross-list search in one DataTable. Cross-List searching was also available in SharePoint Portal 2003 via the Microsoft.SharePoint.Portal.Search namespace. In MOSS cross-list searching is located in the Microsoft.Office.Server.Search.Query namespace in the Microsoft.Office.Server.Search assembly. Cross-List searches in MOSS can be executed using either the KeyWordQuery or the FullTextSqlQuery classes. These two classes are also available on a WSS 3.0 server via the Microsoft.SharePoint.Search.Query namespace in the Microsoft.Sharepoint.Search assembly. However, even though these classes are functionally equivalent WSS 3.0 does not provide the ability to manage crawled properties (metadata) which you will see is very important when developing custom search solutions.
Comparison of MOSS and WSS searching
|
MOSS |
WSS |
|
|
Namespace |
Microsoft.Office.Server.Query |
Microsoft. SharePoint |
|
Classes |
FullTextSqlQuery or KeywordQuery |
SPWeb and SPSiteDataQuery |
|
Syntax |
SQL or Keyword |
CAML (Collaborative Application Markup Language) |
|
Manage Metadata |
Yes |
No |
|
Results |
DataTable |
DataTable |
|
Result Latency |
Results based on last crawl |
No latency |
The importance of managing metadata
SharePoint metadata represents data that describes or categorizes documents and list items. A user wishing to search for documents in SharePoint will typically use keywords to describe the data they are searching for. Having users memorize all the different keywords that could describe the documents they are searching for makes searching difficult. Providing the user with a choice of metadata to search for is better. MOSS provides the built-in "Search Center". The "Search Center" provides an "Advance Search" where you can search using "Property Restrictions". Here the user is presented a drop down list of built-in metadata or properties. However, this drop down list is not dynamically built but is populated from configuration settings of the "Advance Search" webpart. SharePoint site collections can be dynamic with new lists, document libraries and columns being added daily. A good search solution will provide the user with the most current metadata to choose from. Users should be able to easily understand what the metadata represents and how it categorizes documents and list items. Providing "friendly names" that have meaning within a group of users or a corporation will facilitate searching. Moss has the capabilities to manage metadata via SharePoint 3.0 Central Administration. Unfortunately WSS has nothing. As users add columns to document libraries or content types there is the risk that columns with the same name can be added across sites. The columns may have the same name but represent different things depending on what document library they are in or what content type they belong to if any. Developing strategies to manage metadata will be different depending on which product you use.
Managing metadata for searching in MOSS
In order for your custom search solution to search against all SharePoint crawled properties without having to manually create managed properties, you must configure the crawled property category. In SharePoint Central Administration click on the link below "Shared Services Administration". Go to "Search Settings", "Metadata property mappings", "Crawled Properties","SharePoint","Edit Category".
Under the "Bulk Crawled Property Settings" section make sure the "Automatically discover new properties when crawl takes place" is checked along with the "Map all string properties in this category to the Content managed property" and the "Automatically generate a new managed property for each crawled property discovered in this category" options. Making sure these options are on ensures that managed properties are automatically created when new SharePoint columns are created. Your solution can use these new managed properties to present to the user. Unfortunately, the name of the managed property is not that user friendly. SharePoint crawled properties are prefixed with an "ows_" and the auto generated managed property is prefixed with "ows". For example, if a user creates a new column in a document library called "CustomerName" then the crawled property will be "ows_CustomerName" and the managed property will be "owsCustomerName". If you don't want to display this to your users then you will have to write some code to parse out the real column name and make sure you map it back to the managed property name when constructing your query. Additional parsing may be needed if the column name has spaces in it. For instance if a user creates a column named "Customer Name" then the crawled property will be "ows_Customer_x0020_Name" and the managed property will be "owsCustomerx0020Name".
Other strategies to manage metadata would be to periodically monitor new crawled properties and map them to managed properties manually. This might include restrictions on who can add columns to site collections. However, in large site collections this could become a slow process and prevent users from finding documents they need. Finally, the search solution should provide a way for the user to scope there searches either by selecting MOSS scopes or allowing the user to select certain document libraries.
Managing metadata for searching in WSS
Using WSS to develop a custom search solution requires code to crawl the various document libraries to build a unique list of columns the user can search against. This can present a problem if the columns are the same name but have different data types. For instance, an "Invoice Number" column could be defined twice once as text and another as number in different document libraries. The SPSiteDataQuery class uses CAML and the CAML syntax requires the type attribute to be set to the corresponding SharePoint column data type. Therefore, in order to construct a valid CAML query you would have to present the column twice in the drop down list along with its data type (e.g. Invoice Number (Number)). Users searching for documents using WSS will relate to the display name of columns in SharePoint; therefore, you will want to populate the drop down list with the column's display name. However, CAML requires the column's internal name and your solution will have to map the display name to the internal name to construct valid CAML. Other problems arise when columns are renamed. It is possible that a column with a particular display name may map to multiple columns with different internal names. For instance, two columns are created on different document libraries, "Customer Age" and "Age". After a certain amount of time someone decides that they want to make the column names consistent across document libraries and rename the "Customer Age" column to "Age". The search solution now presents just "Age" in the drop down list. The solution now will have to map "Age" to two columns with the internal names of "Customer Age" and "Age" when generating the CAML query. So if the user selects "Age" and wants to find documents where the "Age" is equal to 25 then two where criteria will have to be generated in the CAML as listed below:
<Where>
<Or>
<Eq>
<FieldRef Name="Customer_x0020_Age" />
<Value Type="Number">25</Value>
</Eq>
<Eq>
<FieldRef Name="Age" />
<Value Type="Number">25</Value>
</Eq>
</Or>
</Where>
The above CAML would solve the problem. Unfortunately, this will return no results. This leads us into the main problem of doing searches in MOSS or WSS, the inability to do "OR" searching. In the next part in this series I will illustrate why you cannot rely on MOSS or WSS to return correct results when doing "OR" logical searching.