Gamification in Information Retrieval

My last article was mainly about Collaborative Information Seeking – one of the trends in enterprise search. Another interesting topic is the use of games’ mechanics in CIS systems. I met up with this idea during previously mentioned ESE 2014 conference, but interest is so high, that this year in Amsterdam a GamifIR (workshops on Gamification for Information Retrieval) took place. IR community have debated about what kind of benefits can IR tasks bring from games’ techniques. Workshops cover gamified task in context of searching, natural language processing, analyzing user behavior or collaborating. The last one was discussed in article titled “Enhancing Collaborative Search Systems Engagement Through Gamification” and has been mentioned by Martin White in his great presentation about search trends on last ESE summit.

Gamification is a concept which provides and uses game elements in non-game environment. Its goal is to improve customers or employees motivation for using some services. In the case of Information Retrieval it is e.g. encouraging people to find information in more efficient way. It is quite instinctive because competition is  an inherent part of human nature. Long time ago, business sectors have noticed that higher engagement, activating new users and establishing interaction between them, rewarding the effort of doing something lead to measurable results. Even if quality of data given by users could be higher. Among those elements can be included: leaderboards, levels, badges, achievements, time or resources limitation, challenges and many others. There are even described design patterns and models connected with gameplay, components, game practices and processes. Such rules are essential because virtual badge has no value until being assigned by user.

Collaborative Information Seeking is an idea suited for people cooperating on complex task which leads to find specific information. Systems like this support team work, coordinate actions and improve communication in many different ways and with usage of various mechanisms. At first glance it seems that gamification is perfect adopted to CIS projects. Seekers become more social, feeling of competence foster actions which in turn are rewarded.

The most important thing is to know why do we need gamified system and what kind of benefits we will get. Next step is to understand fundamental elements of a game and find out how adopt them to IR case. In their article “Enhancing Collaborative Search Systems Engagement Through Gamification”, researchers of Granada and Holguin universities have listed propositions how to gamify CIS system.  Based on their suggestions I think essential points are to prepare highly sociable environment for seekers. Every player (seeker) needs to have own personal profile which stores previous achievements and can be customized. Constant feedback on progress, list of successful members, time limitations, keeping the spirit of competition by all kinds of widgets are important for motivating and building a loyalty. Worth to remember that points collected after achieving goals need to be converted into virtual values which can distinguish the most active players. Crucial thing is to construct clear and fair principles, because often information seeking with such elements is a fun and it can’t be ruined.

Researchers from Finnish universities, who published article “Does Gamification Work?”, have broken down a problem of gamifying into components and have thoroughly studied them. Their conclusion was that concept of gamification can work, but there are some weaknesses – context which is going to be gamified and the quality of the users. Probably, the main problem is lack of knowledge which elements really provide benefits.

Gamification can be treated as a new way to deal with complex data structures. Limitations of data analyzing can be replaced by mechanism which increase activity of users in Information Retrieval process. Even more – such concept may leads to more higher quality data, because of increased people motivation. I believe, Collaborative Information Seeking, Gamification and similar ideas are one of the solutions how to improve search experience by helping people to become better searchers than not by just tuning up algorithms.

New look for the GSA-powered file share search at Implement Consulting Group

The file share search on Implement Consulting Group’s intranet is driven by a Google Search Appliance (GSA). Recently, with help from Findwise, the search interface was given a new look, that integrates more seamlessly with the overall design of the intranet.

GSA comes with a default search interface similar to the Google.com search. The interface is easy to customize from GSA’s administrative interface, however, some features are simply not customizable by clicking around. Therefore, GSA supports the editing of an XSLT file for customizing the search. GSA returns the search results in XML format, and by processing this file with XSLT we can customise how the search results look and behave.

Custom CSS and JavaScript was used for integrating GSA’s search functionalities in the look and feel of the intranet. Implement’s new intranet is based on thoughtfarmer.com and the design was delivered by 1508.dk.

– And here is the search results page with a new look:

icg-gsa-screenshot-findwise

The new look of the search results page on Implement Consulting Group’s Google Search Appliance powered search

The search experience in SharePoint 2013: customised or targeted?

This post is the fourth in a series of four articles providing several best practices on how to implement and customise the search experience in SharePoint 2013. The previous posts listed the differences between the cloud and on-premise SharePoint, provided considerations when upgrading to SharePoint 2013, and dealt with the practicalities of configuring search in SharePoint Online. This fourth post handles the more advanced topic of ranking results and the future of search in SharePoint.

Managing ranking

We’ve previously mentioned the query rules as a way to change the ranking of the search results based on your requirements. These allow the promotion of certain search results or search result blocks on top of the ranked searched results, and more advanced query rules allow even changing the ranking of the search results based on what the query terms are.

By using query rules, customising the search results web part, and a few content by search web parts, you can change the behaviour of the search depending on what user is accessing it. That is, you would also need good metadata to make this work, but having a complete user profile (including the job title, department, and interests) is a good start. Based on such user information, you can define how the search experience for that user will be.

Changing ranking using query rules, however, requires a query rule condition, which describes the prerequisites that the query must fulfil in order for the query rule to fire. For changing the results for all queries, you can use the next approach.

If the default ranking does not satisfy your search requirements and you want to change the order of the ranked search results, SharePoint provides the possibility of changing the ranking models. It is a feature available in SharePoint Online as well, as described in the TechNet documentation: “SharePoint Online customers need to download and install the free Rank Model Tuning App in order to create and customize ranking models.”

A ranking model contains the features and corresponding weights that are used in calculating a score for each search result. Changing the ranking models might require a deeper and theoretical knowledge of how search works, and those that take the challenge of changing the ranking model are often dedicated search administrators or external specialised consultants.

The Ranking Model Tuning app is free on the App Store - http://office.microsoft.com/en-001/store/ranking-model-tuning-WA104192565.aspx

The Ranking Model Tuning app is free on the App Store

The Rank Model Tuning App provides a user interface for creating custom ranking models, and can be used for both SharePoint Online and SharePoint Server, though in SharePoint 2013 Server there is also the possibility to use PowerShell to customise ranking models. New models are based on existing ranking models for which you can add or remove new rank features and tune the weight of a rank feature. It also allows for evaluating the new ranking model using a test set of queries. The set of test queries can be constructed from real queries made by users that can be gathered from previous search logs, for example. How to use the tuning app is explained step-by-step in the documentation on the Office site.

Changing the weight of certain file types (say for example for PowerPoint documents compared to Excel documents) might be enough for many search implementations, but depending on the content, the features that influence the ranking of the search results can become more elaborate. For example, a property defining whether documents are either official or work-in-progress might become an important factor in determining the ranking of search results. SharePoint provides the liberty to create new properties, so it makes sense that these can be used in search to improve the relevance.

It should be pointed out, however, that changing the ranking model influences all searches that are run using that ranking model. Though the main idea of changing the ranking model is to improve the ranking, it can become much too easy to make changes that can have an undesirable effect on the ranking. This is why a proper evaluation of ranking changes needs to be part of your plan for improving search relevance.

The office graph and the future of social

The social features introduced in SharePoint 2013 provide a rich social experience, which is interconnected with the search experience. Many social features are driven by search (such as the recommendations for which people or documents to follow), and social factors also affect the search (such as finding the right expertise from conversations in your network).

In the month of June 2012 Microsoft acquired the social enterprise platform Yammer. The SharePoint Server 2013 Preview has been made available for download since July 2012, and it reached Release to Manufacturing (RTM) in October the same year. The new SharePoint 2013 implements new social features (see for example the newsfeed, the new mysites and the tagging system), many of which are overlapping with those available in Yammer! This brings us to the question on everyone’s mind since the acquisition of Yammer: what is the future of social in SharePoint? Should you use SharePoint’s social features or use Yammer?

In March 2014, Microsoft announced that they will not include new features in the SharePoint Social but rather invest in the integration between Yammer and Office 365. The guidance is thus to go for Yammer.

“Go Yammer! While we’re committed to another on-premises release of SharePoint Server—and we’ll maintain its social capabilities—we don’t plan on adding new social features. Our investments in social will be focused on Yammer and Office 365” – Jared Spataro, Microsoft Office blog

Also at the SharePoint conference this March 2014, Microsoft introduced the Office Graph, and with it Oslo as the first app demo using it. During the keynote, Microsoft mentions that the Office Graph is “perhaps the biggest idea we’ve had since the beginning of SharePoint”. The office graph maps relationships between people, the documents they authored, the likes and posts they made, and the emails they received; it’s actually an extension of Yammer’s enterprise graph. The Oslo application is leveraging the graph, in a way that looks familiar from Facebook’s graph search.

The Office Graph, connecting people and information - Microsoft Office Blog http://blogs.office.com/2014/03/03/work-like-a-network-enterprise-social-and-the-future-of-work/

The Office Graph, connecting people and information – Microsoft Office Blog

The new Office Graph provides exciting opportunities, and has consequences for how the search will be used. Findwise started exploring the area of enterprise graph search before Microsoft announced the Office Graph – see our post about the Enterprise Graph Search from January 2013.

Reluctant to go for the cloud?

Microsoft has hinted during the SharePoint conference keynote in March that they will be adding new functionalities to the cloud version first. Although they are still committed to another version of SharePoint server, new updates might come at a slower pace for the on-premise version. However, Microsoft also announced that with the SharePoint SP1 there is a new functionality in the administrative interface: a hybrid setting which allows you to specify whether you want the social component in the cloud/Yammer, or your documents on OneDrive, so that you don’t need to move everything to the cloud overnight.

Let us know how far you’ve come with your SharePoint implementation! Contact us if you need help in deciding which version of SharePoint to choose, need help with tuning search relevance, have questions about improving search, or would like to work with us to reach the next level of findability.

Enterprise Search Europe 2014 – Short Review

ESE Summit

At the end of April  a third edition of Enterprise Search Europe conference took place.  The venue was Park Plaza Victoria Hotel in London. Two-day event was dedicated to widely understood search solutions. There were two tracks covering subjects relating to search management, big data, open source technologies, SharePoint and as always -  the future of search. According to the organizer’ information, there were 30 experts presenting their knowledge and experience in implementation search systems and making content findable. It was  opportunity to get familiar with lots of case studies focused on relevancy, text mining, systems architecture and even matching business requirements. There were also speeches on softer skills, like making  decisions or finding good  employees.

In a word, ESE 2014 summit was great chance to meet highly skilled professionals with competence in business-driven search solutions. Representatives from both specialized consulting companies and universities were present there. Even second day started from compelling plenary session about the direction of enterprise search. Presentation contained two points of view: Jeff Fried, CTO in BA-Insight and Elaine Toms, Professor of Information Science, University of Sheffield. From industrial aspect analyzing user behavior,  applying world knowledge or improving information structure is a  real success. On the other hand, although IR systems are currently in mainstream, there are many problems: integration is still a challenge, systems working rules are unclear, organizations neglect investments in search specialists. As Elaine Toms explained, the role of scientists is to restrain an uncertainty by prototyping and forming future researchers. According to her, major search problems are primitive user interfaces and too few systems services. What is more, data and information often become of secondary importance, even though it’s a core of every search engine.

Trends

Despite of many interesting presentations, particularly one caught my attention. It was “Collaborative Search” by Martin White, Conference Chair and Managing Director in Intranet Focus. The subject was current condition of enterprise search and  requirements which such systems will have to face in the future. Martin White is convinced that limited users satisfaction is mainly fault of poor content quality and insufficient information management. Presentation covered  absorbing results of various researches. One of them, described in “Characterizing and Supporting Cross-Device Search Tasks” document, was analysis of commercial search engine logs in order to find behavior patterns associated with cross device searching. Switching between devices can be a hindrance because of device multiplicity. That is why each user needs to remember both what he was searching and what has already been found. Findings show that there are lots of opportunities to handle information seeking more effectively in multi-device world. Saving and re-instating user session, using time between switching devices to get more results or making use of behavioral, geospatial data to predict task resumption are just a few examples of ideas.

Despite everything, the most interesting part of Martin White’s presentation was dedicated to Collaborative Information Seeking (CIS).

Collaborative Information Seeking

It is natural that difficult and complex tasks forced people to work together. Collaboration in information retrieval helps to use systems more effectively. This idea concentrate on situations when people should cooperate to seek information or sense-make. In fact, CIS covers on the one hand elements connected with organizational behavior or making decision, on the other – evolution of user interface and designing systems of immediate data processing. Furthermore, Martin White considers CIS context to be focused around the complex queries, “second phase” queries, results evaluation or ranking algorithms. This concept is able to bring the highest values in the domains like chemistry, medicine and law.

During the CIS exploration some definitions appeared:  collaborative information retrieval, social searching, co-browsing, collaborative navigation, collaborative information behavior, collaborative information synthesis.  My intention is to introduce some of them.

"Collaborative Information Seeking", Chirag Shah

1. “Collaborative Information Seeking”, Chirag Shah

Collaborative Information Retrieval (CIR) extends traditional IR for the purposes of many users. It supports scenarios when problem is complicated and when seeking common information is a need. To support groups’ actions, it is crucial to know how they work, what are their strengths and weaknesses. In general, it might be said that such system could be an overlay on search engine re-ranking results, based on users community knowledge. In agreement with Chirag Shah, the author of “Collaborative Information Seeking” book, there are some examples of systems where workgroup’s queries and related results are captured and used to filtering more relevant information for particular user. One of the most absorbing case is SearchTogether – interface designed for collaborative web search, described by Meredith R. Morris and Eric Horvitz. It allows to work both synchronously and asynchronously. History of queries, page metadata and annotations serve as information carrier for user. There had been implemented an automatic and manual division of labor. One of its feature was recommending pages to another information seeker. All sessions and past findings were persisted and stored for future collaborative searching.

Despite of many efforts made in developing such systems, probably none of them has been widely adopted. Perhaps it was caused partly by its non-trivial nature, partly by lack of concept how to integrate them with other parts of collaboration in organizations.

Another ideas associated with CIS are Social Search and Collaborative Filtering. First one is about how social interactions could help in searching together. What is interesting,  despite of rather weak ties between people in social networks, their enhancement may be already observed in collaborative networks. Second definition referred to provide more relevant search results based on user past behavior, but also community of users displaying similar interests. It is noteworthy that it is an example of asynchronous interaction, because its value is based on past actions – in contrast with CIS where emphasis is laid to active users communication. Collaborative Filtering has been applied in many domains: industry, financial, insurance or web. At present the last one is most common and it’s used in e-commerce business. CF methods make a base for recommender systems predicting users preferences. It is so broad topic, that certainly deserves a separate article.

CIS Barriers

Regardless of all these researches, CIS is facing many challenges nowadays. One of them is information security in the company. How to struggle out of situation when team members do not have the same security profile or when some person cannot even share with others what has been found? Discussed systems cannot be only created for information seeking, but also they need to  provide managing security, support situations when results were not found because of permissions or situations when it is necessary to view a new document created in cooperation process. If it is not enough, there are various organization’s barriers hindering CIS idea. They are divided into categories – organizational, technical, individual, and team. They consist of things such as organization culture and structure, multiple and un-integrated systems, individual person perception or varied conflicts appeared during team work. Barriers and their implications have been described in detail in document “Barriers to Collaborative Information Seeking in Organizations” by Arvind Karunakaran and Madhu Reddy.

Collaborative information seeking is exciting field of research and one of the search trend. Another absorbing topic is gamification adopting in IR systems. This is going to be a subject of my next article.

Customizing search in SharePoint Online

Search in SharePoint 2013 – Part 3: Customizing search in SharePoint Online

This post is the third in a series of four articles providing several best practices on how to implement and customise search in SharePoint. In the first post, we provided a brief overview of the differences in terms of search between the on-premise and cloud versions, and in the second blog post we discussed several things you should consider when migrating to the new SharePoint. In this post, we will mention several search features that can be configured in SharePoint Online, and we will be specifically be referring to those available in the Enterprise Plan.

Here is a summary of what customisations for search in SharePoint Online will be discussed:

  • Defining your own custom result sources, and hiding any that you are not using
  • Setting up hybrid search if you chose a hybrid solution
  • Defining which refiners to show and how to display them
  • Adding query suggestions that are related to your organisation
  • Adding query spelling corrections
  • Changing how the search results are displayed to show previews and additional metadata

Get ready to search ‘everything’

This is the uncustomized search box that you will see on your search center page.  Please note that in some SharePoint Online plans the ‘Videos’ vertical is not available.

This is the uncustomized search box that you will see on your search center page.
Please note that in some SharePoint Online plans the ‘Videos’ vertical is not available.

Everything is the default scope when performing a search in the SharePoint search center and is returning every type of result from all of your site collections. There are a few other scopes (search verticals, or so-called Result Sources) that are included by default, People, Conversations, and Videos, and these are preconfigured to search on what you would expect.

  • You can add new result sources, say for example Reports, that shows only search results that are tagged with the keyword ‘Final Report’. You define yourself what the criteria for a result source should be.
  • If there is a result source that you are not using, say for example if you have no video content and don’t plan to have in the near future, it’s less confusing for the users if you simply not show it for now. It’s easy to add it back if you will need it in the not so foreseeable future.

If you choose a hybrid solution, your content is split between the online SharePoint and the on-premise SharePoint Server.

  • It’s possible to have one search that displays results from both locations. For example, to show results from the on-premise installation in SharePoint Online, you have to define a new result source that is able to retrieve the results from the on-premise. Then you can configure the search results page to show results from both result sources (everything from SharePoint Online plus everything from SharePoint on-premise that matches the search query).

Screenshot from the post Hybrid search by the Microsoft SharePoint Team Blog showing how results from the cloud are integrated in the search results page when the user searches from an on-premises SharePoint 2013 site.  Notice also the new visual refiner for date interval in the refinement panel on the left.

Screenshot from the post Hybrid Search by the Microsoft SharePoint Team Blog showing how results from the cloud are integrated in the search results page when the user searches from an on-premises SharePoint 2013 site.
Notice also the new visual refiner for date interval in the refinement panel on the left.

Drill down into the search results

The search Refiners allow the users to drill down into the search results. There is a new type of refiner in SharePoint 2013, a visual refiner, by default used for the ‘Modified Date’.

  • The way in which the visualisation of the refiners is made has drastically changed, and you can define your own visualisation of the data if you want to. For example, what about a map as a refiner, instead of a list of city names?

By default, the refiners you will see would be the Result type (example values: Excel, Web page), Author (example values: John Doe, Jane Doe), and Modified Date (shown as a distribution of values).

  • If you edit the web part responsible for the refiners, you will be able to add other refiners as well. For example, company names are automatically extracted from your content, so it is easy to simply add that to your refiners.
  • Also, another useful refiner to show to your users is the Content Type, offering one level of detail more from the Result Type refiner.

Search guidance

Query suggestions are displayed as the user types.

Query suggestions are displayed as the user types.

As the user types a query in the search box, SharePoint is able to show Query Suggestions that help complete the query. SharePoint automatically creates a list of suggestions based on previous searches. When at least 6 search results are clicked for a specific query, that query will be added to the list of suggestions.

  • Besides the list that SharePoint creates automatically, you are able to add your own list of suggestions. This is especially useful when starting fresh with your installation, since a fresh installation will come with no query suggestions. You could help the users by adding your company name, product names or similar to the initial list of suggestions. You will also find manual adding of suggestions useful when reviewing the search logs, since these can give you a new perspective on what the users are looking for, and based on that input help guide your user to the relevant results using query suggestions.
  • You are also able to import a list of suggestions that are not intended to be shown in suggestions. Say for example that your testing team uses a specific keyword for testing content. In this case, it is very probable that the test keyword will soon appear as a suggestion for all users. To avoid this, simply add the keyword to the query suggestion exclusion list.

Similar to the query suggestions, another functionality whose purpose is to help the user in formulating the query is the Query Spelling Correction. An inclusion and exclusion list is used in this case as well, the only difference is that these are managed in the Term Store, while managing query suggestions is made by importing a plain text file.

  • You can add your own terms in the query spelling correction inclusion and exclusion lists. Probably one of the most often misspelled words is the word ‘business’. Or was it ‘bussiness’? After adding this term to the list of words to be included in the spelling suggestions, the correct form of the word would be shown under the ‘Did you mean’ functionality if the user misspells it.

Change how the search results are displayed

Screenshot from an Office Blogs post showing the hover panel for a PowerPoint document.

Screenshot from an Office Blogs post showing the hover panel for a PowerPoint document.

A final item on our list of proposed customisations for your search results is to change how the search results are displayed. In SharePoint 2013, it is the Display Templates that define how each element in the search results page is displayed. For example, there is a template for the refiner, another one for the hover panel of a PDF item, another one for the hover panel of a Word item, and so on.

  • A simple fix would be make sure that you have previews for PDF files in the hover panel. It is the Office Web Apps that power the previews for Office documents (such as Word, PowerPoint, Excel), but the preview for PDF files might not be visible for you. If so, what you can do is change the display template that is associated to the PDF result type.
  • You can also define what metadata to show for each result type. For example, for a Word document you would by default be able to see the Title, a text snippet and a URL, and in the hover panel the document preview, Last Modified date and author, as well as probably a list of the main headings from the document. However, if you have added additional metadata to your document, such as Location or Keywords, you can display these in the search results as well by modifying the right display template.

You can find more information about how to administer many of these search functionalities from this Microsoft Office page and from our search experts. Let us know how far you are in implementing SharePoint online for your organisation – we sure have a few more tips to how to configure and customize the search in SharePoint!

Cloud vs. on-premise SharePoint 2013 search

Search in SharePoint 2013 – Part 1: The difference between search within on-premise SharePoint 2013 and SharePoint Online

Cloud or on-premise? Findwise offers implementation and consulting services for both scenarios. This post is the first in a series of four articles providing several best practices on how to implement and customise search in SharePoint. The focus of this first post is introducing the difference between the cloud and on-premise SharePoint 2013 in terms of search features.

“The cloud is on fire”

That is a quote from the Microsoft Office General Manager Jared Spataro during his keynote at the SharePoint conference in Las Vegas last month. At this conference, Microsoft revealed that 60% of the Fortune top 500 adopted Office 365 in the previous 12 months. While new versions of on-premise SharePoint and Exchange Server are promised to still come next year, Microsoft is adding more and more capabilities to the cloud version.

SPC14 Keynote summary

Fun random facts about SharePoint Online presented during the keynote at the SharePoint conference in Las Vegas this year (March 3rd 2014)

In addition to the numbers above, a market analysis report done by The Radicati Group on the adoption of Microsoft SharePoint reveals that almost a quarter of the worldwide users accessing deployments of SharePoint made during the year 2013 are using the cloud based SharePoint.

When deciding whether to go for the on-premise or cloud solution, a go-to resource for your IT team is the TechNet article describing the availability of features across the solutions. That article not only divides the features between on-premise and cloud, but also between the different Office 365 and SharePoint Online plans. What is the difference? SharePoint Online is the cloud version of the SharePoint Server, but it can be deployed as a standalone service or as part of the Office 365 suite, so different plans are usually listed for these different scenarios. There are also the Office 365 Dedicated plans, but these are out of the scope for this article. The Microsoft Office site has a more business oriented comparison of the different plans, including pricing. If not decided for one or the other, there is also the possibility of a hybrid solution!

 Availability Search feature Office 365 Small BusinessOffice 365 Small Business Premium Office 365 Midsize BusinessOffice 365 Enterprise E1 or K1Office 365 Education A2Office 365 Government G1 or K1 Office 365 Enterprise E3 or E4Office 365 Education A3 or A4Office 365 Government G3 or G4 SharePoint Online Plan 1 SharePoint Online Plan 2 SharePoint Foundation 2013 SharePoint Server 2013 Standard CAL SharePoint Server 2013 Enterprise CAL
Available within all plans
Phonetic name matching Yes Yes Yes Yes Yes Yes Yes Yes
Expertise Search Yes Yes Yes Yes Yes Yes Yes Yes
Quick preview Yes Yes Yes Yes Yes Yes Yes Yes
RESTful Query API/Query OM Yes Yes Yes Yes Yes Yes Yes Yes
Result sources Yes Yes Yes Yes Yes Yes Yes Yes
Search results sorting Yes Yes Yes Yes Yes Yes Yes Yes
Ranking models Yes Yes Yes Yes Yes Yes Yes Yes
Query spelling correction Yes Yes Yes Yes Yes Yes Yes Yes
Refiners Yes Yes Yes Yes Yes Yes Yes Yes
Manage search schema Yes Yes Yes Yes Yes Yes Yes Yes
Available in all Office365 and SharePoint Online plans
Deep links Yes Yes Yes Yes Yes No Yes Yes
Event-based relevancy Yes Yes Yes Yes Yes No Yes Yes
Graphical refiners Yes Yes Yes Yes Yes No Yes Yes
Recommendations Yes Yes Yes Yes Yes No Yes Yes
Search vertical: “Conversations” Yes Yes Yes Yes Yes No Yes Yes
Search vertical: “People” Yes Yes Yes Yes Yes No Yes Yes
Query suggestions Yes Yes Yes Yes Yes No Yes Yes
Query throttling Yes Yes Yes Yes Yes No Yes Yes
“This List” searches Yes Yes Yes Yes Yes No Yes Yes
Query rules—Add promoted results Yes Yes Yes Yes Yes No Yes Yes
Avail. in Office365 Advanced Content Processing Yes Yes Yes No No Yes Yes Yes
Hybrid search No Yes Yes Yes Yes Yes Yes Yes
Query rules—advanced actions No No Yes No No No No Yes
Search vertical: “Video” No No Yes No Yes No No Yes
Not available in any of the Office 365, SharePoint Online plans
Search connector framework No No No No No No Yes Yes
Custom entity extraction No No No No No No No Yes
Extensible content processing No No No No No No No Yes

– Simplified view of the TechNet article, focusing on the search features availability across SharePoint solutions

Limitations in Office 365 and SharePoint Online plans

Is the cloud version good enough for your organisation when it comes to search features? The table above illustrates some of the things that you might be missing in terms of search, and in what follows we will discuss those whose availability varies amongst the Office 365 or SharePoint Online plans.

Query rules – advanced actions

In order to adapt the relevance of the search results to the user intent, SharePoint 2013 adds a new feature called query rules. A query rule is defined by a condition and a corresponding action to be taken when the condition is met. Within some SharePoint Online licenses, this functionality is limited to the possibility of adding promoted results, while more advanced actions are left out. The promoted results are similar to what was in previous SharePoint versions known as search keywords, or best bets, letting you promote specific results on top of the ranked search results. The more advanced actions could consist of for example changing the query or changing the ranking of the search results by promoting a certain group of results. You can read more about various usages of query rules in one of our previous blog post.

Search Connector Framework and Hybrid Search

Administrators of SharePoint Online will miss the feature of managing the different search connectors to content sources, since the search connector framework is not available. Only SharePoint content that is stored online is going to be indexed. Search results can only be retrieved from that content, or can be set up to retrieve from an Exchange Server, from a remote SharePoint, or from a search engine that uses the OpenSearch protocol. As an alternative approach to making content from other sources searchable, you can set up hybrid search. This feature is available in almost all Office 365 and SharePoint Online scenarios. It allows users to show search results from content available in the cloud and on-premise. So if you would like to index a content source that is not supported in SharePoint Online, you should be able to index it on the on-premise.

Custom Entity Extraction

The TechNet article describing features across solutions actually shows that this feature is only available with the enterprise licensing of SharePoint Server. This feature allows the extraction of custom-defined terms from your content and making them usable as search refiners. Say for example that you would like to extract all of your current product names from the content of your documents and then be able to refine your search results on the product name.

Content Processing Extensibility

The other search feature that is only available with the enterprise licensing of SharePoint Server is the content processing extensibility. In practice, this means there is an API that can be used to transform the data before it is stored in the index. For example, more advanced entity extraction can be made at this step. While the custom entity extraction discussed previously is able to identify names in the content based on a pre-defined list of names, through this API you can use a trained model to do entity extraction for example. Additional use cases could be cleaning or normalising the data according to predefined rules (for example, lowercasing all values in a property), or automatically tagging items based on the content.

It should be noted that the TechNet article is not a comprehensive list, and rather gives an overview of the major differences between solutions. Here is for example one more feature whose availability is limited.

Synonyms

One of the missing features in SharePoint Online that is available in the on-premise solution is the possibility of defining synonyms. Since it’s too easy to communicate the same thing with different words, defining synonyms or abbreviations for search phrases can help aggregate the results for the multiple ways of expressing the same information need. We hope that Microsoft will integrate this feature in the future versions of SharePoint Online as well.

Find the right documentation

When searching for which functionality is available across solutions on the Microsoft Office.com website or TechNet, make sure to check that the discussed functionality applies to your version of SharePoint. Articles usually indicate for which versions the functionality applies to.

Feature availability in MS articles

Articles on Office.com (left) and TechNet (right) indicate for which version
of SharePoint the discussed topic applies to.

Please note that things might change, new updates in SharePoint online can add functionality that was missing before. To stay up-to-date, check the TechNet page once in a while, or contact us to help you map your requirements to the available search features across solutions.

Event driven indexing for SharePoint 2013

In a previous post, we have explained the continuous crawl, a new feature in SharePoint 2013 that overcomes previous limitations of the incremental crawl by closing the gap between the time when a document is updated and when the change is visible in search. A different concept in this area is event driven indexing.

Content pull vs. content push

In the case of event driven indexing, the index is updated real-time as an item is added or changed. The event of updating the item triggers the actual indexing of that item, i.e. pushes the content to the index. Similarly, deleting an item results in deleting the item from the index immediately, making it unavailable from the search results.

The three types of crawl available in SharePoint 2013, the full, incremental and continuous crawl are all using the opposing method, of pulling content. This action would be initiated by the user or automated to start at a specified time or time intervals.

The following image outlines the two scenarios: the first one illustrates crawling content on demand (as it is done for the full, incremental and continuous crawls) and the second one illustrates event-driven indexing (immediately pushing content to the index on an update).

Pulling vs pushing content, showing the advantage of event driven indexing

Pulling vs pushing content

Example use cases

The following examples are only some of the use cases where an event-driven push connector can make a big difference in terms of the time until the users can access new content or newest versions of existing content:

  • Be alerted instantly when an item of interest is added in SharePoint by another user.
  • Want deleted content to immediately be removed from search.
  • Avoid annoying situations when adding or updating a document to SharePoint and not being able to find it in search.
  • View real-time calculations and dashboards based on your content.

Findwise SharePoint Push connector

Findwise has developed for its SharePoint customers a connector that is able to do event driven indexing of SharePoint content. After installing the connector, a full crawl of the content is required after which all the updates will be instantly available in search. The only delay between the time a document is updated and when it becomes available in search is reduced to the time it takes for a document to be processed (that is, to be converted from what you see to a corresponding representation in the search index).

Both FAST ESP and Fast Search for SharePoint 2010 (FS4SP) allow for pushing content to the index, however this capability was removed from SharePoint 2013. This means that even though we can capture changes to content in real time, we are missing the interface for sending the update to the search index. This might be a game changer for you if you want to use SharePoint 2013 and take advantage of the event driven indexing, since it actually means you would have to use another search engine, that has an interface for pushing content to the index. We have ourselves used a free open source search engine for this purpose. By sending the search index outside the SharePoint environment, the search can be integrated with other enterprise platforms, opening up possibilities for connecting different systems together by search. Findwise would assist you with choosing the right tools to get the desired search solution.

Another aspect of event driven indexing is that it limits the resources required to traverse a SharePoint instance. Instead of continuously having an ongoing process that looks for changes, those changes come automatically when they occur, limiting the work required to get that change. This is an important aspect, since the resources demand for an updated index can be at times very high in SharePoint installations.

There is also a downside to consider when working with push driven indexing. It is more difficult to keep a state of the index in case problems occur. For example, if one of the components of the connector goes down and no pushed data is received during a time interval, it becomes more difficult to follow up on what went missing. To catch the data that was added or updated during the down period, a full crawl needs to be run. Catching deletes is solved by either keeping a state of the current indexed data, or comparing it with the actual search engine index during the full crawl. Findwise has worked extensively on choosing reliable components with a high focus on robustness and stability.

The push connector was used in projects with both SharePoint 2010 and 2013 and tested with SharePoint 2007 internally. Unfortunately, SharePoint 2007 has a limited set of event receivers which limits the possibility of pure event driven indexing. Also, at the moment the connector cannot be used with SharePoint Online.

You will probably be able to add a few more examples to the use cases for event driven indexing listed in this post. Let us know what you think! And get in touch with us if you are interested in finding more about the benefits and implications of event driven indexing and learn about how to reach the next level of findability.

Continuous crawl in SharePoint 2013

Continuous crawl is one of the new features that comes with SharePoint 2013. As an alternative to incremental crawl, it promises to improve the freshness of the search results. That is, the time between when an item is updated in SharePoint by a user and when it becomes available in search.

Understanding how this new functionality works is especially important for SharePoint implementations where content changes often and/or where it’s a requirement that the content should instantly be searchable. Nonetheless, since many of the new SharePoint 2013 functionalities depend on search (see the social features, the popular items, or the content by search web parts), understanding continuous crawl and planning accordingly can help level the user expectation with the technical capabilities of the search engine.

Both the incremental crawl and the continuous crawl look for items that were added, changed or deleted since the last successful crawl, and update the index accordingly. However, the continuous crawl overcomes the limitation of the incremental crawl, since multiple continuous crawls can run at the same time. Previously, an incremental crawl would start only after the previous incremental crawl had finished.

Limitation to content sources

Content not stored in SharePoint will not benefit from this new feature. Continuous crawls apply only to SharePoint sites, which means that if you are planning to index other content sources (such as File Shares or Exchange folders) your options are restricted to incremental and full crawl only.

Example scenario

The image below shows two situations. In the image on the left (Scenario 1), we are showing a scenario where incremental crawls are scheduled to start at each 15 minutes. In the image on the right (Scenario 2), we are showing a similar scenario where continuous crawls are scheduled at each 15 minutes. After around 7 minutes from starting the crawl, a user is updating a document. Let’s also assume that in this case passing through all the items to check for updates would take 44 minutes.

Continuous crawl SharePoint 2013

Incremental vs continuous crawl in SharePoint 2013

In Scenario 1, although incremental crawls are scheduled at each 15 minutes, a new incremental crawl cannot be started while there is a running incremental crawl. The next incremental crawl will only start after the current one is finished. This means 44 minutes for the first incremental crawl to finish in this scenario, after which the next incremental crawl kicks in and finds the updated document and send it to the search index. This scenario shows that it could take around 45 minutes from the time the document was updated until it is available in search.

In Scenario 2, a new continuous crawl will start at each 15 minutes, as multiple continuous crawls can run in parallel. The second continuous crawl will see the updated document and send it to the search index. By using the continuous crawl in this case, we have reduced the time it takes for a document to be available in search from around 45 minutes to 15 minutes.

Not enabled by default

Continuous crawls are not enabled by default and enabling them is done from the same place as for the incremental crawl, from the Central Administration, from Search Service Application, per content source. The interval in minutes at which a continuous crawl will start is set to a default of 15 minutes, but it can be changed through PowerShell to a minimum of 1 minute if required. Lowering the interval will however increase the load on the server. Another number to take into consideration is the maximum number of simultaneous requests, and this is a configuration that is done again from the Central Administration.

Continuous crawl in Office 365

Unlike in SharePoint 2013 Server, continuous crawls are enabled in SharePoint Online by default but are managed by Microsoft. For those used to the Central Administration from the on-premise SharePoint server, it might sound surprising that this is not available in SharePoint Online. Instead, there is a limited set of administrative features. Most of the search features can be managed from this administrative interface, though the ability to manage the crawling on content sources is missing.

The continuous crawl for Office 365 is limited in the lack of control and configuration. The crawl frequency cannot be modified, but Microsoft targets between 15 minutes and one hour between a change and its availability in the search results, though in some cases it can take hours.

Closer to real-time indexing

The continuous crawl in SharePoint 2013 overcomes previous limitations of the incremental crawl by closing the gap between the time when a document is updated and when this is visible in the search index.

A different concept in this area is the event driven indexing, which we will explain in our next blog post. Stay tuned!

Entity Recognition with Google Search Appliance 7.2

Introduction

In this article we would like to present some of the possibilities offered by the entity recognition option of Google Search Appliance (GSA). Entity recognition was introduced with the release of version 7.0 and improvements will still be added in future releases. We have used version 7.2 to write this blogpost and illustrate how GSA can perform named-entity recognition and sentiment analysis.

Entity Recognition in brief

Entity recognition enables the GSA to discover entities (such as names of people, places, organizations, products, dates, etc.) in documents where these are not available in the Metadata or in general, may be needed in order to enhance the search experience (e.g. via faceted search/dynamic navigation). There are three ways of defining entities:

  • With a TXT format dictionary of entities, where each entity type is in a separate file.
  • With an XML format dictionary, where entities are defined by synonyms and regular expressions. Currently, the regular expressions only match single words.
  • With composite entities written as an LL1 grammar.

Example 1: Identifying people

The basic setup for recognition of person names is to upload a dictionary of first names and a dictionary of surnames. Then, you can create a composite entity full name by using a simple LL1 grammar rule, for example {fullname}::=[firstname] [surname]. Every first name in your dictionary, followed by a space and then followed by a surname will be recognized as a full name. With the same approach, you can define more complex full names such as:

{fullName}::= {Title set}{Name set}{Middlenames}{Surname set}
{Title set}::=[Title] {Title set}
{Title set} ::= [epsilon]
{Name set} ::= [Name] {Name set2}
{Name set2} ::= [Name] {Name set2}
{Name set2} ::= [epsilon]
{Middlenames} ::= [Middlename]
{Middlenames} ::= [epsilon]
{Surname set} ::= [Surname] {Surname set2}
{Surname set2} ::= [Surname] {Surname set2}
{Surname set2} ::= [epsilon]

A full name will be recognized if it matches 0 or 1 instances of a title, one or more first names, 0 or 1 middle names and one or more surnames, all separated with a space. (e.g.: Dr John Anders Lee).

Limitations

  • All the names in the content will be matched
  • Common words similar to names will be matched. Example: Charlotte Stone. To reduce this limitation, you can enable the case sensitive option and match a full name
  • In the preceding example, Dr John Anders Lee and John Anders Lee will be recognized as a different person
  • No support for multiple entities within composite entities. John Anders Lee will be matched as a full name, but John will not be matched as a name.

PersonEntityGSA

Example 2: Identifying places

Place names such as cities, countries, streets can be easily defined with the help of dictionaries in TXT format. One can also define locations by using regular expressions, especially if these share the same substring (e.g. “street” or “square”). For example, a Swedish street will often contain the substring “gata”, meaning “street”:

<instance>
<name> Street </name>
<pattern>.*gatan</pattern>
<pattern>.*gata</pattern>
<pattern>.*torget</pattern>
<pattern>.*plats</pattern>
<pattern>.*platsen</pattern>
<store_regex_or_name>regex</store_regex_or_name>
</instance>

This will allow us to identify one-word places like “Storgatan“, “Järntorget” but will fail in cases where we have 2 or more words in the name such as “Olof Palmes plats”.

Swedish postal codes can be defined with a regex matching 5 digits. Note, however, that all numbers of 5 digits will be matched as a postal code and that you cannot define space in the postal code due to the regular expression limitation of the GSA only matching a single word.

You can use the synonyms function of the xml dictionary to link a postal code with a city.

<instance>
<name> Göteborg </name>
<term>40330</term>
<term>40510</term>
<term>41190</term>
<term>41302</term>
<store_regex_or_name>name</store_regex_or_name>
</instance>

40330, 40510, 41190 and 41302 will be recognized as the entity Göteborg.

You can also use the synonyms to describe a territory division (kommun, län, country).

<instances>
   <instance>
     <name> Göteborg Stad</name> 
     <term> Angered </term>
     <term> Backa </term>
     <term> Göteborg </term>
     <term> Torslanda </term>
     <term> Västra Frölunda </term>
   </instance> 
   <instance>
     <name> Öckerö </name>
     <term> Hönö </term>
     <term> Öckerö </term> 
     <term> Rörö </term>
   </instance>
</instances>

PlacesEntityGSA

 

Example 3: Sentiment analysis

Sentiment analysis aims at identifying the predominant mood (happy/sad, anger/happiness, positive/negative, etc) of a document by analyzing its content. Here we will show you a simple case of identifying positive vs negative mood in a document.

Basic analysis

For a basic analysis one can create two dictionaries, one with positive words (good, fine, excellent, like, love …) and one with negative words (bad, dislike, don’t, not …). Such an analysis is simplistic and very limited for the following reasons:

• There is no real grammar
• Limited coverage of the lexicons
• No degree of judgment
• No global analysis of the document (if a document has 3 different polarity words it will be tagged with 3 different categories)

Screen Shot 2014-04-08 at 11.19.31

Analysis with grammar

If you add a dictionary of negations, you can create a more powerful tool with just a small grammar of compose entities. For example, {en negative} ::= [en negation] [en positive word] will correctly identify the English “not good”, “don’t like”, “didn’t succeed”  as negative terms. One can certainly create deeper analysis with more advanced grammar. Thus you can  specify special dictionaries for gender, emphatic words, nouns, verbs, adjectives,etc and build composite entities, and grammar rules with them. Below you see an example of the application of a simple grammar.

Screen Shot 2014-04-08 at 11.36.46

Degrees of sentiment

You can also add some degrees in the sentiments using the synonyms feature.

<instances>
 <instance>
  <name> Good </name>
  <term> good </term>
  <term> fine </term>
  <term> like </term>
 </instance>
 <instance>
  <name> Very Good </name>
  <term> excellent </term>
  <term> amazing </term>
  <term> great </term>
 </instance>
 <instance>
  <name> Bad </name>
  <term> bad </term>
  <term> dislike </term>
  <term> don’t </term>
  <term> can’t </term>
  <term> not </term>
 </instance>
 <instance>
  <name> Very Bad </name>
  <term> awful </term>
  <term> hate </term>
 </instance>
</instances>

Note, however that you cannot combine such synonym entries with other entity dictionaries or grammar rules.

Screen Shot 2014-04-08 at 12.00.16

Limitations

There are some limitations of this approach as well:

  • No possibility to extract global sentiment for a given document. You cannot count in a document how many terms are matched as good and how many are matched as bad and then define the global sentiment for this document. However, when the regular expression limitations are fixed, one will be able to do so.
  • As with sentiment analysis in general and other dictionary-based approaches it is hard to discover sarcasm and irony.

Conclusion

In this blog post we showed how one can use the Entity recognition feature of GSA 7.2. While there are still some limitations of the tools provided, they are mature enough to enhance your search solution. Depending on the type of data, one can do simple sentiment analysis as well as more complex recognition of entities by using LL1 grammar.

A nice add-on to the Entity recognition setup in the GSA would be the possibility to load pre-trained models for Named Entity Recognition or sentiment analysis.

Links

Entity recognition with GSA:

http://www.google.com/support/enterprise/static/gsa/docs/admin/72/admin_console_help/crawl_entity_recognition.html

Dynamic navigation:

http://www.google.com/support/enterprise/static/gsa/docs/admin/72/admin_console_help/serve_dynamic_navigation.html