Hunting the encoding problem in content deployment – part 1

October 24th, 2007 by pholpar

Note – This is a repost of the original writing that was lost when the SharePointBlogs site crashed. Original article was published on 2007.04.06.

A few month ago when we made some experiments with content deployment feature of MOSS 2007, we found that article pages that contain accentuated letters in their meta-information (like Title or Publishing Content) sometimes deployed to the target server with incorrect content. It seems that encoding of the text is wrong on the page. After a search with Google we found some forum and blog posts that complain for similar issues.

Content Deployment deploys weird characters and mangles the page layouts – bug? 

Microsoft SharePoint Products and Technologies Team Blog – Content Deployment 

Most of the users complained that the non-breaking space ( ) is converted to character "A" on the target system. Since our language (Hungarian) has several accentuated letters that are also converted to strange double-character letters, this problem is more frustrating for us.

We decided to investigate the source of the problem and now I would like to share the results with you.

The content deployment is one of the several under-documented features of MOSS. We found that when a content deployment job runs on the source server, the content is exported to the temporal folder you specified on the Central Administration site (Operations / Content Deployment Settings /Temporary Files / Path). The format of the export package seems to be identical to the format of a simple export package you get when using the Run() method of an SPExport object to create an export file. You may want to read Ton Stegeman's great article about how to do this (Exporting and importing SharePoint 2007 content using the object model).

Since the extract files created during the content deployment are really temporary, if you have a small site with few content, it is not easy to catch the package after the export but before the import finished. To tell the truth it seems to us that sometimes there are no files created at all in this process. Maybe MOSS can handle smaller exports in memory, but maybe the file is there but it's deleted by the time we could catch it.

We found that it's much easier to use the export to catch the output files than hunting the temporary files during the content deployment. Since the similarities of the export and content deployment processes we chose the first one to generate packages for our investigation.

The export package is basically a .CAB file that contains a relatively large Manifest.xml and some smaller .XML files and several .DAT files. The .DAT files are actually holding the content itself (let it be an .ASPX page, a document or either an image), while the manifest contains the information required to reproduce the site content from the .DAT files, including the meta-information like content types and item properties.

We checked the content of the manifest and found that there are two parallel sections for each publishing page.

One of these sections is for describing the list item object:

    <SPObject Id="2c1ad535-3bff-43ce-8b0c-69eef3425fe5" ObjectType="SPListItem" ParentId="fc0d7053-b70e-4603-8fd3-39e5d40533f9" ParentWebId="87493b75-a13c-47cf-9d01-1f52112209b5" ParentWebUrl="/" Url="/Pages/default.aspx">

        <ListItem FileUrl="Pages/default.aspx" DocType="File" ParentFolderId="44ca4ae0-0dd3-4705-a460-b4d98b5ea5b4" Id="2c1ad535-3bff-43ce-8b0c-69eef3425fe5" ParentWebId="87493b75-a13c-47cf-9d01-1f52112209b5" ParentListId="fc0d7053-b70e-4603-8fd3-39e5d40533f9" Name="default.aspx" DirName="Pages" IntId="1" DocId="a51a0a90-647b-4d90-be78-42c3fe94caff" Version="5.0" ContentTypeId="0×010100C568DB52D9D0A14D9B2FDCC96666E9F2007948130EC3DB064584E219954237AF390064DEA0F50FC8C147B0B6EA0636C4A7D40015B6CA5925E96F4596C8AE31EF05B195" Author="1073741823" ModifiedBy="1073741823" TimeLastModified="2007-04-04T15:10:15" TimeCreated="2007-03-28T11:52:04" ModerationStatus="Approved">

            <Fields>

                <Field Name="_ModerationComments" FieldId="34ad21eb-75bd-4544-8c73-0e08330291fe" />

                <Field Name="Modified_x0020_By" FieldId="822c78e3-1ea9-4943-b449-57863ad33ca9" />

                <Field Name="Created_x0020_By" FieldId="4dd7e525-8d6b-4cb4-9d3e-44ee25f973eb" />

                <Field Name="File_x0020_Type" Value="aspx" FieldId="39360f11-34cf-4356-9945-25c44e68dade" />

                <Field Name="HTML_x0020_File_x0020_Type" FieldId="0c5e0085-eb30-494b-9cdd-ece1d3c649a2" />

                <Field Name="_SourceUrl" FieldId="c63a459d-54ba-4ab7-933a-dcf1c6fadec2" />

                <Field Name="_SharedFileIndex" FieldId="034998e9-bf1c-4288-bbbd-00eacfc64410" />

                <Field Name="Title" Value="This is the title" FieldId="fa564e0f-0c70-4ab9-b863-0177e6ddd247" />

  

            </Fields>

                     </ListItem>    </SPObject> The other one for the file object:

    <SPObject Id="a51a0a90-647b-4d90-be78-42c3fe94caff" ObjectType="SPFile" ParentId="44ca4ae0-0dd3-4705-a460-b4d98b5ea5b4" ParentWebId="87493b75-a13c-47cf-9d01-1f52112209b5" ParentWebUrl="/" Url="/Pages/default.aspx">

        <File Url="Pages/default.aspx" Id="a51a0a90-647b-4d90-be78-42c3fe94caff" ParentWebId="87493b75-a13c-47cf-9d01-1f52112209b5" ParentWebUrl="/" Name="default.aspx" ListItemIntId="1" ListId="fc0d7053-b70e-4603-8fd3-39e5d40533f9" ParentId="44ca4ae0-0dd3-4705-a460-b4d98b5ea5b4" TimeCreated="2007-03-28T11:52:41" TimeLastModified="2007-04-04T15:08:46" Version="5.0" SetupPath="SiteTemplatesBLANKINTERNETdefault.aspx" SetupPathVersion="3" SetupPathUser="1073741823" FileValue="0000000E.dat" ModifiedBy="1073741823">

            <Properties>                

                <Property Name="vti_cachedtitle" Type="String" Access="ReadOnly" Value="This is the title" />

                <Property Name="ContentTypeId" Type="String" Access="ReadWrite" Value="0×010100C568DB52D9D0A14D9B2FDCC96666E9F2007948130EC3DB064584E219954237AF390064DEA0F50FC8C147B0B6EA0636C4A7D40015B6CA5925E96F4596C8AE31EF05B195" />

                <Property Name="vti_cachedneedsrewrite" Type="Boolean" Access="ReadOnly" Value="false" />

                <Property Name="vti_parserversion" Type="String" Access="ReadOnly" Value="12.0.0.4518" />

                <Property Name="vti_charset" Type="String" Access="ReadOnly" Value="utf-8" />

                <Property Name="vti_title" Type="String" Access="ReadOnly" Value="This is the title" />

                             </Properties>

        </File>

    </SPObject>

Notice that there are redundant information in these sections as the same property value (in this case the Title) is exported both as the value of a list item field and as a value of a file property. Since the values are identical in this case, this causes no problem.

But what happens if we use special characters, like accentuated letters in the field value? Let's check the title with the expression we use to test applications for Hungarian language compatibility. This expression is "Arv¡zturo tk”rf£r¢g‚p" and could be translated to English as "Flood-resistant mirror drill".  The words in this expression contain all the accentuated letters exist in our language.

Setting this expression as the title of an article page, and exporting the content of the pages folder we found in the manifest.xml that although the Title field for the list item contains the correct value ("Arv¡zturo tk”rf£r¢g‚p") the value of the vti_title property is transformed as "A?rvA­ztñr? tA¬kArfA§rA3gAcp". When we restored the content of the package to another SPS web application the title of the page displayed the correct value until we manually removed the XML node containing the Title field value from the Manifest.xml . In this case the title of the page displayed the converted value. So it seems that normally the Title field value takes precedence over the redundant vti_title property, but if the first value is missing (or the precedence is mismatched) the incorrect  value may be displayed.

Using the export method we were able to reproduce the non-breaking space conversion problem too. If we set the HTML source of a page content to "non&nbsp;breaking&nbsp;space" the Manifest.xml in the export package contains "nonA breakingA space" for the PublishingPageContent file property value and "non breaking space" for the PublishingPageContent list item field value. In the case of "non breaking space" list item field value the spaces are not standard spaces (char code 32) but non-breaking spaces (character code 160).

We have not found yet why the normal working or precedence the correct values over the incorrect ones goes wrong sometimes during the content deployment, but concentrated on why and how the incorrect values are produced in the Manifest.xml. The next part of this article will focus on this topic.

Primary key violation when using the SPExport object

May 22nd, 2007 by pholpar

Last month we made a simple utility that creates backup for a single MOSS list/document library or a given set of lists/document libraries, and imports the backup. It was similar to the code you can read on Ton Stegeman's blog.

After using it successfully for a few weeks we found that in some cases it throws an exception like this: Violation of PRIMARY KEY constraint 'PK__#ExportObjects____XXXXXXXX'. Cannot insert duplicate key in object 'dbo.#ExportObjects'. The exception was thrown at the "Progress: Calculating Objects to Export." phase of the export.

This message was not new for us, because we had this previously when running content deployment in incremental mode. At that time we found that it is a known issue, since Ryan Steeno posted a comment on this on Ton Stegeman's blog. Later we found a similar post on Jespers' blog. Unfortunately none of the version setting workarounds suggested (Site Collection Images, Site Collection Style Library, and we tried several others) worked for us, so we left with the full deployment option.

Getting this exception from the utility was a new thing we haven't read about. Since for this problem we haven't found workaround and we were curious we decided to create SQL trace to catch the source of the problem.

We found that the SPExport.Run() calls several proc_Depl* stored procedures. In the case of the error, the proc_DeplAddListItemDependencies SP (that is called directly by the proc_DeplAddExportObjectDependencies SP) causes the PKV. We checked the code and found that the first part of this SP is responsible for the error. This SP inserts new records to the #ExportObject table, and although there is a check that only items that have IDs not existing already in the table, nothing guaranties that the ID of the inserted items are unique.

In this case, for the items with type 8 (that seems to be content type) in #ExportObject, the ID is calculated from the last 16 digits of the content type ID. Since the INSERT statement tries to insert a single content type several times (seems to have a single item for each content type -folder pair), the same ID is calculated for the same content type each time, so PKV is not a bug surprise.

We modified (hacked) the proc_DeplAddExportObjectDependencies in our test system to have only a single item for each content type. Alternative is to have several content type – folder pairs for each content types and generate the ID with NEWID(). This latter one also requires a helper temporary table because using NEWID() simple in the SELECT statement would cause additional records because of the DISTINCT used.

With this modification the incremental content deployment is working, and also there is no exception with the export utility. But this is for sure not a suggested (or supported) scenario for a production environment.

After doing this, we found that exporting sites with STSADM is working. Since from the output messages it was clear that the STSADM should use the SPExport API as well, we hoped that by checking its code we will found a correct resolution for our problem.

The site export function in the STSADM is done by Microsoft.SharePoint.StsAdmin.SPExportOperation.Run(). There were several differences between the SPExportSetting we used in our code and the one was used in STSADM. We found that the PKV problem can be solved by using the ExcludeDependencies = true setting.

Today we were notified that the hotfix for the incremental content deployment PKV is available from Microsoft, and we read the same information on William Cornwill's blog. Although the export utility now runs with this setting we hope the fix will solve the issue of running SPExport with the default SPExportSettings ExcludeDependencies = false as well.