SharePoint 2010 Search Ranking Models

| September 30, 2011

SharePoint 2010 has a new feature that allows administrators to change ranking models. The default ranking model is the product of a great deal of research on Microsoft’s part and, for the most, works very well. However, with SharePoint 2010, a certain level of functionality has been provided that allows administrators to change the default ranking algorithm to another built in ranking algorithm, or even create their own based on a provided schema. This is a ‘feature’ that was touted as a great new part of SharePoint 2010 when it was released. Unfortunately, it’s not all that straightforward. Viewing the ID’s of the built in ranking algorithms and changing them is actually very straightforward and easy to do. Making your own (that work) is not so easy. I spent many hours playing around with the schema, basically reverse engineering the shema from the SharePoint Search Service Admin database, and eventually finding a model that works with minor tweaks. I will explain here how I did it and how you can use the code below to make your own ranking adjustments.

On Ranking

Ranking in SharePoint is a two stage algorithm that is based on the Okapi BM25 TF/IDF ranking model devised at London’s City University. A lot of research and adjustments have been made to this algorithm and it is, naturally, a well kept secret at Microsoft. However, the basics can be found in articles and patents around the web. And also, building your model on top of the algorithm will help you understand how it works. Basically, the idea behind any ranking algorithm is that for any query, the best result is placed at the top of the page and less ‘perfect’ results ranked below it in decending order. The issue in many cases is that there are many factors to consider when ranking documents. Before we continue with the details of the algorithm and how to adjust it let’s look at how to see the built-in ranking algorithms in SharePoint.

Finding the built in algorithms

 

 

 

 

In order to see the built in ranking algorithms in SharePoint, one should open SharePoint Management Shell (PowerShell). Then run the following command:

 

Get-SPEnterpriseSearchServiceApplication | Get-SPEnterpriseSearchRankingModel

This will display a list of all ranking algorithms. You can then take the GUID of the algorithm of choice and place it in a RM parameter on your search center to see how it changes the ranking. For example:

 

http://SP2010/Search/results.aspx?k=queryterm&s=All%20Sites&rm=97cbcebd-037c-4346-9bc4-582d8c560204

This will search for the term ‘queryterm’ and apply the high proximity ranking algorithm. This is one of my favourites as it increases the ranking for terms searched that are in close proximity of each other.

The ranking algorithms that are built in and easy to choose are:

  • MainResultsDefaultModel
  • ExpertiseSocialDistanceModel
  • HighProximityRankingModel
  • MainPeopleSocialDistanceModel
  • ExpertiseModel
  • NameSocialDistanceModel
  • NameModel
  • MainPeopleModel
  • NoProximityRankingModel

If you want to make a search results page permanently use a specific ranking algorithm, you can modify the DefaultRankingModelID property on the Core Search Results Webpart with the GUID for the specific model.

You can also make your own ranking algorithms and apply them (well sort of). Microsoft provides a default schema you can download and modify and a procedure for adding some rules based on that schema and applying them with Powershell.

To create a new ranking model and apply your schema to it use the following Powershell:

 

Get-SPEnterpriseSearchServiceApplication | New-SPEnterpriseSearchRankingModel –rankingmodelxml "{XML String of Schema}"

If you have more than one SSA you will want to define the correct one but this works in most cases. The xml string is touchy and you must remove all spaces between tags so be careful.

 

Query Dependent and Independent ranking

The sample ranking model schema provide by Microsoft is provided here:

<rankingModel name=“string” id=“GUID” description=“string” xmlns=“http://schemas.microsoft.com/office/2009/rankingModel”> 
<queryDependentFeatures> 
<queryDependentFeature pid=“PID” name=“string” weight=“weightValue” lengthNormalization=“lengthNormalizationSetting” /> 
</queryDependentFeatures> 
<queryIndependentFeatures> 
<categoryFeature pid=“PID” default=“defaultValue” name=“string”> 
<category value=“categoryValue” name=“string” weight=“weightValue” /> 
</categoryFeature> 
<languageFeature pid=“PID” name=“string” default=“defaultValue” weight=“weightValue” /> 
<queryIndependentFeature pid=“PID” name=“string” default=“defaultValue” weight=“weightValue”> 
<transformRational k=“value” /> 
<transformInvRational k=“value” /> 
<transformLinear max=“maxValue” /> 
</queryIndependentFeature> 
</queryIndependentFeatures> 
</rankingModel>

You can see that there are two main areas on the XML; Query Dependent Features and Query Independent Features. Others on the web have describe these as dynamic and static ranking considerations. That works if you like but basically, query dependent features are those ranking considerations that are taken dependent on the specific query. So if the query is ‘dog’ these considerations are taken based on all the text for the properties defined in the query dependent features that match ‘dog’. Query independent features are those properties that are considered regardless of the query. These can be seen as the ranking values that are added to a specific document given any search. Generally, these are values like click depth, relation to an authoritative page, and popularity.

Note: I haven’t figured out a good example for the category feature yet but will make another post on that later.

Within the tags there are several properties that need values:

PID: this refers to the PID of the property that should be considered. Use the following Powershell to get a list of usable properties.

Get-SPEnterpriseSearchServiceApplication | Get-SPEnterpriseSearchMetadataManagedProperty

Name: the name of the property

Weight – a value for how much weight to give this property. This is tricky so be careful.

LengthNormalization – a factor that determines how document length should factor in on the weight. This number is very hard to understand and it is best to stick with the default values.

 

 

 

 

 

 

This brings me to my next point.

What are the default values?

I attempted to start with the default schema and add my own properties. At first, I only added one property expecting the search to consider everything but documents with only that property were returned. I then had to add, what I thought were other properties to consider and guess the values. Eventually, I opened the Search Administration Database in SQL and looked at the xml from the default algorithms.












In the SQL database you can see all the ranking algorithms’ xml and see what the default values are. To make a long story short, this is where I got the list of properties that I wanted to consider for the algorithm and the best default values for them. By capturing these values, I was able to nearly recreate the behavior of the default algorithm. Then I could add an additional property value and increase its value substantially, forcing that property to have more weight in the ranking without adversely affecting the overall ranking.

Creating a Model

See the attached PowerShell scripts for creating and modifying the algorithm. Here:

SurfRank.ps1 – Script to create a new algorthim with a near default schema

SurfRankUpdate.ps1 – Script for updating that algorithm while you tweak it

NOTE: change the filenames from txt to ps1 and call them while in SharePoint Management Shell.

You need to start by creating a new algorithm and then modifying and updating it with a set cmdlet.

Use this command to create a new script:

Get-SPEnterpriseSearchServiceApplication | New-SPEnterpriseSearchRankingModel –rankingmodelxml "{XML Schema}"

Use this command to update it (where the GUID is your GUID):

Get-SPEnterpriseSearchServiceApplication | Get-SPEnterpriseSearchRankingModel {your GUID} | Set-SPEnterpriseSearchRankingModel –rankingmodelxml "{XML Schema}"

NOTE: You will need a guid for your new algorithm. Just search on the web for ‘create a guid’ and use a web generator. I used this one.

Here is the XML from my modified schema after collecting the values from the default schema that I thought were relevant:

<?xml version='1.0'?><rankingModel name='SurfRank2' id='8447b4bc-3582-45c5-9cb8-ba2a319d850e' description='SurfRank2' xmlns='http://schemas.microsoft.com/office/2009/rankingModel'>
<queryDependentFeatures>
<queryDependentFeature name='Dreams' pid='407' weight='10.593169953073459' lengthNormalization='2.28258134389272'/>
<queryDependentFeature name='Body' pid='1' weight='0.0125145559138435' lengthNormalization='0.0474870346616999'/>
<queryDependentFeature name='Title' pid='2' weight='1.46602125767061' lengthNormalization='0.549393313908594'/>
<queryDependentFeature name='Author' pid='3' weight='0.410225403867996' lengthNormalization='1.0563226501349'/>
<queryDependentFeature name='DisplayName' pid='56' weight='0.570071355441683' lengthNormalization='0.552529462971364'/>
<queryDependentFeature name='ExtractedTitle' pid='302' weight='1.67377875011698' lengthNormalization='0.600572652201123'/>
<queryDependentFeature name='SocialTag' pid='264' weight='0.593169953073459' lengthNormalization='2.28258134389272'/>
<queryDependentFeature name='QLogClickedText' pid='100' weight='1.87179361911171' lengthNormalization='3.31081658691434'/>
<queryDependentFeature name='AnchorText' pid='10' weight='0.593169953073459' lengthNormalization='2.28258134389272'/>
</queryDependentFeatures>
<queryIndependentFeatures>
<queryIndependentFeature name='ClickDistance' pid='96' default='5' weight='1.86902034145632'><transformInvRational k='0.0900786349287429'/></queryIndependentFeature>
<queryIndependentFeature name='URLDepth' pid='303' default='3' weight='1.68597497899313'><transformInvRational k='0.0515178916330992'/></queryIndependentFeature>
<queryIndependentFeature name='Lastclick' pid='341' default='0' weight='0.219043069749249'><transformRational k='5.44735200915216'/></queryIndependentFeature>
<languageFeature name='Language' pid='5' default='1' weight='-0.56841237556044'/>
</queryIndependentFeatures>
</rankingModel>"

Note the QueryDependentFeature using the ‘Dreams’ property. This is a property that I have created on my sharepoint site and decided to rank higher. You will want to add a field with the appropriate property id for your own valuable property.

 

There is a chapter in the book (chapter 10) that has more information on this and we will be providing a webcast on www.surfray.com on how to do it, so please check out those resources. If it doesn’t work, leave a comment and I’ll try to help you move forward with it. I am hoping to access some more information on how this works at the SharePoint conference next week.

Robert

 

Tags: , , , ,

Category: Developer Articles

Comments are closed.