By: TwitterButtons.com

Recent blog posts

User login

Home | Blogs | Stephane Eyskens's blog

SharePoint User Profiles + BCS + XML Serialization problems when data is not sanitized

Hi,

Let me tell you a (true) story that happened to me recently...I was playing around with SharePoint for one of my customers and faced a problem I never encountered before. To my surprise, I didn't find anything on the web describing that issue and of course, describing how to fix it!

The User Profile engine is certainly one of the most complex component of SharePoint. Looking at ForeFront Identity Manager can give some headaches...

Well, here is the scenario that can lead to some troubles :

  • Your primary UP data connection is Active Directory
  • You are using a second UP data connection based on a BCS Layer in order to supplement the user profiles (see this blog post for more info on how to combine BCS & User Profiles). For simplicity, say that you've created the BCS layer with SharePoint Designer
  • You execute a full or an incremental sync but the properties values coming from the BCS are not reflected in your User Profiles...

Looking at FIM client, you find out the following error :

stopped-extension-dll. You don't have any other message in FIM. However, in the Windows event log, you might find an error like this one :

and wonder what the hell is going on. Remember, we didn't develop anything. We just created a BCS layer with SharePoint Designer, created a UP data connection on that BCS and mapped a UP property to a field coming from the BCS.

Looking closer at the stack trace, you'll notice that the User Profile engine failed to parse XML data because of an invalid character. This invalid character comes in fact from your data...In the above error, the hexadecimal character 0x0B stands for a vertical tab (\v) and was present in the data itself. Because of that, the UP engine crashes...

It truly happened to me in the real world. I don't know how the employee inserted a vertical tab in her profile data but she probably made a copy/paste from Winword or any other editing tool...

Well, in this case, the BCS layer works fine, the best proof is to create an external list and you will not encounter any problem. According to the browser you use, you'll see that wierd character shown on the detailed view of the BCS item, otherwise, viewing the page source will reveal its presence.

To reproduce that, it is very easy :

  • Create a table with two columns (Account, Data) and with 1 record comprising \v in your data or any other special ASCII character except \r \n etc...
  • Create an external content type with SharePoint Designer
  • Create a User Profile data connection based on your BCS layer and map AccountName to your column Account
  • Create a User Profile property and map it to your BCS column Data
  • Perform a full sync

You should reproduce the problem. So now, how to get this fixed? If the BCS layer was created with SharePoint Designer, I'm afraid that your only option is to "clean" the data...While this can be envisionned, this might not always be possible if the data comes from a legacy system you do not own or if simply you do not have a write access to the datasource.

Another option that's much more robust is to write a custom .NET type or WCF service to sanitize your data before sending it back to the UP engine.

Before doing that, you will notice that the following code created via the excellent tool BCS Meta Man will generate the same error when the UP sync is executed.

As you can see, by default, the Specific Finder method will return a .NET object to the UP engine :

[BcsMethodType(MethodType.SpecificFinder)]
public GetSingleBCSNETEntityByIDProperties GetSingleBCSNETEntityByID(string ACCOUNT)
{
    var dataContext = new BCSUPXMLPROBLEMContext(ConnectionString);
    IEnumerable records = from record in dataContext.BCSNET
                                                                where record.ACCOUNT == ACCOUNT
                                                                select new GetSingleBCSNETEntityByIDProperties
                                                                {
                                                                    ACCOUNT = record.ACCOUNT,
                                                                    BCSDATA = record.BCSDATA
                                                                };
    return records.FirstOrDefault();
}

In our case, we are only returning a .NET type with two string members which are ACCOUNT, the identifier and BCSDATA the data mapped to our User Profile property...so you might think you are safe but you aren't if the database column BCSDATA itself contains an invalid character.

So, you need to sanitize your data as explained in this post

Transforming the above code with something like this :

[BcsMethodType(MethodType.SpecificFinder)]
public GetSingleBCSNETEntityByIDProperties GetSingleBCSNETEntityByID(string ACCOUNT)
{
    var dataContext = new BCSUPXMLPROBLEMContext(ConnectionString);
    IEnumerable records = from record in dataContext.BCSNET
                                                                where record.ACCOUNT == ACCOUNT
                                                                select new GetSingleBCSNETEntityByIDProperties
                                                                {
                                                                    ACCOUNT = record.ACCOUNT,
                                                                    BCSDATA = SanitizeXmlString(record.BCSDATA)
                                                                };
    return records.FirstOrDefault();
}

/// 
/// Remove illegal XML characters from a string.
/// 
public string SanitizeXmlString(string xml)
{
	if (xml == null)
	{
		throw new ArgumentNullException("xml");
	}

	StringBuilder buffer = new StringBuilder(xml.Length);

	foreach (char c in xml)
	{
		if (IsLegalXmlChar(c))
		{
			buffer.Append(c);
		}
	}

	return buffer.ToString();
}

/// 
/// Whether a given character is allowed by XML 1.0.
/// 
public bool IsLegalXmlChar(int character)
{
	return
	(
		 character == 0x9 /* == '\t' == 9   */          ||
		 character == 0xA /* == '\n' == 10  */          ||
		 character == 0xD /* == '\r' == 13  */          ||
		(character >= 0x20    && character <= 0xD7FF  ) ||
		(character >= 0xE000  && character <= 0xFFFD  ) ||
		(character >= 0x10000 && character <= 0x10FFFF)
	);
}

will fix the problem at the cost of a performance overhead and a little more effort...Of course, you might implement your own escape technique or preferably write a helper class but I'm making it short here...

Last but not least : when this error occurs, it seems that not only the profiles having special characters are not updated but none of them are updated...scary isn't it? :)

This was tested with SP1 not with the June 2011 CU but I doubt it will change anything...



Happy Coding!