Continuous crawl in SharePoint 2013

Continuous crawl is one of the new features that comes with SharePoint 2013. As an alternative to incremental crawl, it promises to improve the freshness of the search results. That is, the time between when an item is updated in SharePoint by a user and when it becomes available in search.

Understanding how this new functionality works is especially important for SharePoint implementations where content changes often and/or where it’s a requirement that the content should instantly be searchable. Nonetheless, since many of the new SharePoint 2013 functionalities depend on search (see the social features, the popular items, or the content by search web parts), understanding continuous crawl and planning accordingly can help level the user expectation with the technical capabilities of the search engine.

Both the incremental crawl and the continuous crawl look for items that were added, changed or deleted since the last successful crawl, and update the index accordingly. However, the continuous crawl overcomes the limitation of the incremental crawl, since multiple continuous crawls can run at the same time. Previously, an incremental crawl would start only after the previous incremental crawl had finished.

Limitation to content sources

Content not stored in SharePoint will not benefit from this new feature. Continuous crawls apply only to SharePoint sites, which means that if you are planning to index other content sources (such as File Shares or Exchange folders) your options are restricted to incremental and full crawl only.

Example scenario

The image below shows two situations. In the image on the left (Scenario 1), we are showing a scenario where incremental crawls are scheduled to start at each 15 minutes. In the image on the right (Scenario 2), we are showing a similar scenario where continuous crawls are scheduled at each 15 minutes. After around 7 minutes from starting the crawl, a user is updating a document. Let’s also assume that in this case passing through all the items to check for updates would take 44 minutes.

Continuous crawl SharePoint 2013

Incremental vs continuous crawl in SharePoint 2013

In Scenario 1, although incremental crawls are scheduled at each 15 minutes, a new incremental crawl cannot be started while there is a running incremental crawl. The next incremental crawl will only start after the current one is finished. This means 44 minutes for the first incremental crawl to finish in this scenario, after which the next incremental crawl kicks in and finds the updated document and send it to the search index. This scenario shows that it could take around 45 minutes from the time the document was updated until it is available in search.

In Scenario 2, a new continuous crawl will start at each 15 minutes, as multiple continuous crawls can run in parallel. The second continuous crawl will see the updated document and send it to the search index. By using the continuous crawl in this case, we have reduced the time it takes for a document to be available in search from around 45 minutes to 15 minutes.

Not enabled by default

Continuous crawls are not enabled by default and enabling them is done from the same place as for the incremental crawl, from the Central Administration, from Search Service Application, per content source. The interval in minutes at which a continuous crawl will start is set to a default of 15 minutes, but it can be changed through PowerShell to a minimum of 1 minute if required. Lowering the interval will however increase the load on the server. Another number to take into consideration is the maximum number of simultaneous requests, and this is a configuration that is done again from the Central Administration.

Continuous crawl in Office 365

Unlike in SharePoint 2013 Server, continuous crawls are enabled in SharePoint Online by default but are managed by Microsoft. For those used to the Central Administration from the on-premise SharePoint server, it might sound surprising that this is not available in SharePoint Online. Instead, there is a limited set of administrative features. Most of the search features can be managed from this administrative interface, though the ability to manage the crawling on content sources is missing.

The continuous crawl for Office 365 is limited in the lack of control and configuration. The crawl frequency cannot be modified, but Microsoft targets between 15 minutes and one hour between a change and its availability in the search results, though in some cases it can take hours.

Closer to real-time indexing

The continuous crawl in SharePoint 2013 overcomes previous limitations of the incremental crawl by closing the gap between the time when a document is updated and when this is visible in the search index.

A different concept in this area is the event driven indexing, which we will explain in our next blog post. Stay tuned!

Entity Recognition with Google Search Appliance 7.2


In this article we would like to present some of the possibilities offered by the entity recognition option of Google Search Appliance (GSA). Entity recognition was introduced with the release of version 7.0 and improvements will still be added in future releases. We have used version 7.2 to write this blogpost and illustrate how GSA can perform named-entity recognition and sentiment analysis.

Entity Recognition in brief

Entity recognition enables the GSA to discover entities (such as names of people, places, organizations, products, dates, etc.) in documents where these are not available in the Metadata or in general, may be needed in order to enhance the search experience (e.g. via faceted search/dynamic navigation). There are three ways of defining entities:

  • With a TXT format dictionary of entities, where each entity type is in a separate file.
  • With an XML format dictionary, where entities are defined by synonyms and regular expressions. Currently, the regular expressions only match single words.
  • With composite entities written as an LL1 grammar.

Example 1: Identifying people

The basic setup for recognition of person names is to upload a dictionary of first names and a dictionary of surnames. Then, you can create a composite entity full name by using a simple LL1 grammar rule, for example {fullname}::=[firstname] [surname]. Every first name in your dictionary, followed by a space and then followed by a surname will be recognized as a full name. With the same approach, you can define more complex full names such as:

{fullName}::= {Title set}{Name set}{Middlenames}{Surname set}
{Title set}::=[Title] {Title set}
{Title set} ::= [epsilon]
{Name set} ::= [Name] {Name set2}
{Name set2} ::= [Name] {Name set2}
{Name set2} ::= [epsilon]
{Middlenames} ::= [Middlename]
{Middlenames} ::= [epsilon]
{Surname set} ::= [Surname] {Surname set2}
{Surname set2} ::= [Surname] {Surname set2}
{Surname set2} ::= [epsilon]

A full name will be recognized if it matches 0 or 1 instances of a title, one or more first names, 0 or 1 middle names and one or more surnames, all separated with a space. (e.g.: Dr John Anders Lee).


  • All the names in the content will be matched
  • Common words similar to names will be matched. Example: Charlotte Stone. To reduce this limitation, you can enable the case sensitive option and match a full name
  • In the preceding example, Dr John Anders Lee and John Anders Lee will be recognized as a different person
  • No support for multiple entities within composite entities. John Anders Lee will be matched as a full name, but John will not be matched as a name.


Example 2: Identifying places

Place names such as cities, countries, streets can be easily defined with the help of dictionaries in TXT format. One can also define locations by using regular expressions, especially if these share the same substring (e.g. “street” or “square”). For example, a Swedish street will often contain the substring “gata”, meaning “street”:

<name> Street </name>

This will allow us to identify one-word places like “Storgatan“, “Järntorget” but will fail in cases where we have 2 or more words in the name such as “Olof Palmes plats”.

Swedish postal codes can be defined with a regex matching 5 digits. Note, however, that all numbers of 5 digits will be matched as a postal code and that you cannot define space in the postal code due to the regular expression limitation of the GSA only matching a single word.

You can use the synonyms function of the xml dictionary to link a postal code with a city.

<name> Göteborg </name>

40330, 40510, 41190 and 41302 will be recognized as the entity Göteborg.

You can also use the synonyms to describe a territory division (kommun, län, country).

     <name> Göteborg Stad</name> 
     <term> Angered </term>
     <term> Backa </term>
     <term> Göteborg </term>
     <term> Torslanda </term>
     <term> Västra Frölunda </term>
     <name> Öckerö </name>
     <term> Hönö </term>
     <term> Öckerö </term> 
     <term> Rörö </term>



Example 3: Sentiment analysis

Sentiment analysis aims at identifying the predominant mood (happy/sad, anger/happiness, positive/negative, etc) of a document by analyzing its content. Here we will show you a simple case of identifying positive vs negative mood in a document.

Basic analysis

For a basic analysis one can create two dictionaries, one with positive words (good, fine, excellent, like, love …) and one with negative words (bad, dislike, don’t, not …). Such an analysis is simplistic and very limited for the following reasons:

• There is no real grammar
• Limited coverage of the lexicons
• No degree of judgment
• No global analysis of the document (if a document has 3 different polarity words it will be tagged with 3 different categories)

Screen Shot 2014-04-08 at 11.19.31

Analysis with grammar

If you add a dictionary of negations, you can create a more powerful tool with just a small grammar of compose entities. For example, {en negative} ::= [en negation] [en positive word] will correctly identify the English “not good”, “don’t like”, “didn’t succeed”  as negative terms. One can certainly create deeper analysis with more advanced grammar. Thus you can  specify special dictionaries for gender, emphatic words, nouns, verbs, adjectives,etc and build composite entities, and grammar rules with them. Below you see an example of the application of a simple grammar.

Screen Shot 2014-04-08 at 11.36.46

Degrees of sentiment

You can also add some degrees in the sentiments using the synonyms feature.

  <name> Good </name>
  <term> good </term>
  <term> fine </term>
  <term> like </term>
  <name> Very Good </name>
  <term> excellent </term>
  <term> amazing </term>
  <term> great </term>
  <name> Bad </name>
  <term> bad </term>
  <term> dislike </term>
  <term> don’t </term>
  <term> can’t </term>
  <term> not </term>
  <name> Very Bad </name>
  <term> awful </term>
  <term> hate </term>

Note, however that you cannot combine such synonym entries with other entity dictionaries or grammar rules.

Screen Shot 2014-04-08 at 12.00.16


There are some limitations of this approach as well:

  • No possibility to extract global sentiment for a given document. You cannot count in a document how many terms are matched as good and how many are matched as bad and then define the global sentiment for this document. However, when the regular expression limitations are fixed, one will be able to do so.
  • As with sentiment analysis in general and other dictionary-based approaches it is hard to discover sarcasm and irony.


In this blog post we showed how one can use the Entity recognition feature of GSA 7.2. While there are still some limitations of the tools provided, they are mature enough to enhance your search solution. Depending on the type of data, one can do simple sentiment analysis as well as more complex recognition of entities by using LL1 grammar.

A nice add-on to the Entity recognition setup in the GSA would be the possibility to load pre-trained models for Named Entity Recognition or sentiment analysis.


Entity recognition with GSA:

Dynamic navigation:

Reaching Findability #6

Findability is surprisingly complex due to the large number of measures needed to be understood and undertaken. I believe that one of the principal challenges lies within the pedagogical domain. This is my sixth and last post in a series of simple tips for reaching Findability.

Understand your business case!

Many decisions are based on gut-feeling rather than solid business cases. Improving Findability is often a good investment! Being able to back that statement up with numbers makes it easier to move forward with the decision.

For some applications of search, it is relatively easy to show the benefits in numbers. Improving Findability on a web page can increase conversion rates and online sales, or decrease the need for customer support to mention a few examples.

It is however, more difficult to put numbers on internal applications of search. Finding information quicker and easier obviously saves time, but is time always worth money? I believe the first and foremost benefit may be that decisions can be made based on better information.

If finding relevant information is made quick and easy enough, the will for actually spending time using for decision-making will increase. Search can give you a better overview of what information is available, provide you with information you didn’t know existed or previously didn’t have access to. And it can serve information you might be interested in because of your position and context.

A solid business case for working with search is more or less difficult to find. That does not however, remove the need for working on it. One benefit is keeping focus on what really matters to your organization. Another is convincing others that new ways of working with information, as a strategic asset, matters. Fortunately, there are good methods to find, express and measure the benefits of Findability to make your business case solid!

For some inspiration and insight into the general state of findability you can download and read the report from our global survey, Enterprise Search & Findability Survey 2013.

If you are interested in more details about how to achieve Findability you can also download and read our whitepaper “Best practices for search”!

How relevance models work

A relevance model is what a search engine uses to rank documents in a search result, i.e. how it finds the document you are looking for. An axiomatic analysis of relevance models is asking the questions: how and why does a relevance model work? Findwise attended the ICTIR 2013 conference in Copenhagen where one of the recurring topics was the axiomatic analysis of relevance models.

The relevance model is represented through a mathematical function of a set of input variables, and therefore just by looking at its formula it is likely to be very difficult to answer those two questions. What the axiomatic analysis aims to do is to break down the formulas and to isolate and analyze each of its individual components, with the goal of making improvements in the performance.

The idea is to formulate a set of axioms, meaning laws that a relevance model should abide by. One of the more obvious axioms, from a purely statistical point of view, relates to term frequency (TF): a document d1, where the terms of the query occur more times than in some other document d2, is to be assigned a higher relevance than d2. These are called axioms because they should be relevance truths – statements that are obvious and that everyone can agree on. Other examples of axioms could be that very long documents should be penalized simply because they have a higher probability to contain any word, and that terms frequent in many documents should contribute less to the relevance than terms that are more unique.

From an Enterprise Search perspective, these axioms do not have to be general relevance truths, but more adapted to your organization and your users. Here we see a shift in the type of axioms from pure statistics-based towards more metadata-based, e.g. which fields are more relevant than others and which sources are more relevant. A very simple example of this is that a hit in the title is more relevant than a hit in the body. These are usually conveniently configurable in most search engines, e.g. Apache Solr.

This concept is useful and interesting for many reasons since it not only allows you to modify and improve existing relevance models but you can also create new ones from scratch. This process can also be automated using Machine Learning algorithms, which leaves us with the task of finding the optimal set of axioms. Can you think of axioms that can be applied to your organization, your users informational needs and the content that is made searchable?

Reaching Findability #5

Findability is surprisingly complex due to the large number of measures needed to be understood and undertaken. I believe that one of the principal challenges lies within the pedagogical domain. This is my fifth post in a series of simple tips for reaching Findability.

Effect driven development!

Most projects are undertaken to achieve operational improvements in an organization. An often relied upon truth is that IT driven, or what I’d like to call “feature driven”, projects are more likely to fail than projects with a clear focus on business benefits. Many pitfalls can be avoided by using a structured approach while making business benefits the centre of attention.

An excellent tool to keep focus on satisfying real business needs, rather than developing features only few will use, is Effect Mapping developed by InUse AB. The basic idea in this method is defining what actual business effects should result from a project, as well as identifying target groups contributing to these effects and their specific needs. It is all visualized in a so-called Effect Map.

The Effect Map is a great tool for communicating goals and means needed to achieve them. It can be used throughout an entire project to manage changing requirements, keep the business in focus and prioritize what really matters; all while new knowledge is gained.

Development of an Effect Map takes place during a series of workshops, providing valuable insights to any project. It is especially helpful in Findability projects, as they span many different parts of any business, from managing information and the organization to technical solutions.

A good strategy used in conjunction with an Effect Map and long-term goals provides you with a great starting point for succeeding with your Findability. You can read more about different approaches to building the business case in this blog post!

Swedish language support (natural language processing) for IBM Content Analytics (ICA)

Findwise has now extended the NLP (natural language processing) in ICA to include both support for Swedish PoS tagging and Swedish sentiment analysis.

IBM Content Analytics with Enterprise Search (ICA) has its strength in natural language processing (NLP) which is achieved in the UIMA pipeline. From a Swedish perspective, one concern with ICA has always been its lack of NLP for Swedish. Previously the Swedish support in ICA consisted only of dictionary-based lemmatization (word: “sprang” -> lemma: “springa”). However, for a number of other languages ICA has also provided part of speech (PoS) tagging and sentiment analysis. One of the benefits of the PoS tagger is its ability to disambiguate words, which belong to multiple classes (e.g. “run” can be both a noun and a verb) as well as assign tags to words, which are not found in the dictionary. Furthermore, the POS tagger is crucial when it comes to improving entity extraction, which is important when a deeper understanding of the indexed text is needed.

Findwise has now extended the NLP in ICA to include both support for Swedish PoS tagging and Swedish sentiment analysis. The two images below shows simple examples of the PoS support.

Example when ICA uses NLP to analyse the string "ICA är en produkt som klarar entitetsextrahering"Example when ICA uses NLP to analyse the string "Watson deltog i jeopardy"

The question is how this extended functionality could be used?

IBM uses ICA and its NLP support together with several of their products. The jeopardy playing computer Watson may be the most famous example, even if it is not a real product. Watson used NLP in its UIMA pipeline when it analyzed its data from sources such as Wikipedia and Imdb.

One product which leverage from ICA and its NLP capabilities is Content and Predictive Analytics for Healthcare. This product helps doctors to determine which action to take for a patient given the patient’s journal and the symptoms. By also leveraging the predictive analytics from SPSS it is possible to suggest the next action for the patient.

ICA can also be connected directly to IBM Cognos or SPSS where ICA is the tool which creates structure to unstructured data. By using the NLP or sentiment analytics in ICA, structured data can be extracted from text documents. This data can then be fed to IBM Cognos, SPSS or non IBM products such as Splunk.

ICA can also be used on its own as a text miner or a search platform, but in many cases ICA delivers its maximum value together with other products. ICA is a product which helps enriching data by creating structure to unstructured data. The processed data can then be used by other products which normally work with structured data.

Speaking at the Agile HR Conference in Stockholm

In my role as process coordinator for Talent Management I will be speaking at the Agile HR Conference in Stockholm on November 27. My talk will focus on our approach to Talent Management and the tools we use. Findwise continously strive to improve the Talent Management process with the ambition to attract, develop, engage and retain excellent talents. The work during 2013 has resulted in a new version of our Talent Management process and the tools we use.

Findwise speaking at Agile HR

Our view on talent management is this; Talent attract talent. To find new talented people we must make the ones who are already working here thrive and feel inspired. How do we accomplish this? Well, talented people must be respected as equals and be given the freedom to create and innovate. You don’t hire a talent to tell him or her how to do what they are talented at. That would be like hiring Michael Jackson and tell him how to write a hit song. We want our people to feel encouraged to act independently and bravely, that is how their talents best are put to use for Findwise and our clients.

Hope to see you at the conference!

Intranets that have an impact

Recently I attended Euroia, the European information architecture summit, where experts within the area meet up to discuss, share, listen and learn.

For me, one of the highlights was James Robertson from Step Two Designs, presenting some of the results from their yearly intranet awards. Intranets are fascinating in being large systems with such potential to improve daily work. However, more often than not they fail in doing so. As James Robertson put it “organizations and intranets is the place where user experience goes to die”.  So, what can we do to change that?

Robertson talked about successful companies managing to create structured, social and smart intranets. Two examples were the International Monetary Fund and a Canadian law firm. Both needed easy and secure gathering and retrieval of large amounts of information. Part of their success came from mandatory classification of published documents and review of changes. Another smart solution was to keep a connection between parent documents and their derivatives, making sure that information was trustworthy and kept up to date.

Companies that excelled at social managed to bind everything together; people projects and customers. I was happy to hear this, as we have been working a lot on this at Findwise. Our latest internal project was actually creating our own knowledge graph, connecting skills, platforms and technologies with projects and customers. What we haven’t done yet but other successful companies have, is daring to go all in with social. Instead of providing social functionality on the side, they fully integrate their social feed into the intranet start page. This I’d like to try at Findwise.

The ugliest but smartest solution presented by James, combined analytics with proper tagging of information. Imagine the following; a policy is changed and you are informed. However, you don’t need the policy until you perform a task months later. Now, the policy information is hidden in a news archive and you can’t easily find it. Annoying right?

What CRS Australia does to solve this problem is simple and elegant. They track pages users visit on the intranet. Whenever someone updates a page they enter whether it is a significant change or not. This is combined with electronic forms for everything. When filling in a form, information regarding policy updates pop up automatically, ensuring that users always have up to date information.

These ideas give me hope and clearly show that intranets needn’t be a place where user experience comes to die.

Reaching Findability #4 – Build for the long term!

Findability is surprisingly complex due to the large number of measures needed to be understood and undertaken. I believe that one of the principal challenges lies within the pedagogical domain. This is my fourth post in a series of simple tips for reaching Findability.

Build for the long term!

A platform for Findability takes time to build. It is partly about technology development but equally about organisational maturity.

Mechanisms for managing both information and search technology need to be established and adopted. The biggest effect though, is realized when people start thinking about information differently. When they start wanting to share and find information more easily, and desire the possibility to do so.

As with any change project, it helps to not make too many changes at the same time. It is often easier to establish a long-term goal and take small steps along the way. To reach the goal of Findability, the first step is to define a Findability strategy. While information is refined and the technical platform is developed step-by-step, the organization is allowed time to mature.

Choose your technical search platform carefully, and think long-term based on the specific requirements of your business. Give priority to supporting the processes and target groups due to receive the most tangible benefits from finding information easier and more quickly. One valuable method for defining goals, target group needs and means by which to fulfill them is Effect Mapping, developed by InUse AB, which can be used early on in the transformation to gain and communicate important insights

The technical architecture, as with the information structure, must be well thought out when laying the foundation of the platform. Start out with a first application, perhaps an intranet or public web search. The extent and influence of the search platform can then be gradually built out by adding new information sources and components in accordance with the long-term plan. With the right priorities, business value is created every step of the way. New ideas can be tested and problems mitigated before the consequences become difficult to handle.

A good example of an organization with target group focused development is Municipality of Norrköping. You can watch an entertaining presentation of how they do it here!

Report from the 4th International Conference on the Theory of Information Retrieval

Findwise sponsored the 4th International Conference on the Theory of Information Retrieval (ICTIR) that took place in Copenhagen 29 September – 2 October 2013. The scope of the conference is to present the latest research and promote the exchange of ideas on the theory and foundations of Information Retrieval (IR). Findwise was at the conference to pick up theoretical ideas and bring them into practice at customers.


Findwise sponsoring ICTIR 2013

Findwise sponsoring ICTIR 2013


  Is There Space For Theory In Modern Commercial Search Engines?

Ricardo Baeza-Yates (from Yahoo! Labs, Spain) had a keynote during the conference with the title Is There Space For Theory In Modern Commercial Search Engines? An interesting question for which the answer was an expected yes, and he quoted Donald E. Knuth to support this answer: “the best theory is inspired by practice and the best practice is inspired by theory” (“Theory and Practice”, Theoretical Computer Science, 1991).

His presentation eventually focused on predictive algorithms, and how these can be applied for the challenges in web search. Two examples were illustrated to make the point for applying predictive algorithms in information retrieval: (1) tier prediction and (2) query intent prediction.

In the first example, the task is to predict which corpus of documents to search, in order to provide faster answer time, given a query. It is often the case, especially in large international organizations, that the indexes of documents are partitioned to provide better response times. The task would be then to predict which partition to search, based on the query (without running the actual query). Using machine learning, a corpus predictor will predict which corpus to search, retrieve results from that corpus and then assess whether the choice was right; if wrong, it will try to correct the action. The decrease in answer time means however an increase in infrastructure costs (read more about tier prediction and the cost-efficiency trade-off in this article).

In the second example, the task is to predict the user intent given a query. Most of the work in query intent identification considers only one or a few facets of the query (its topic for example, or its informational, navigational, or transactional nature) and Ricardo mentions that the query is just “the tip of the iceberg” when it comes to understanding user intent. Given a user query, he proposes that the query is classified using multiple facets, and more specifically the following nine dimensions: genre, topic, task, objective, specificity, scope, authority sensitivity, spatial sensitivity, and time sensitivity (see photo below with his slide on how these facets are defined and which values each takes). You can read more about query intent prediction in one of his invited talks.

Ictir 2

Photo taken at ICTIR 2013

 IR research – challenges and long-range opportunities

The conference also included a panel discussion on the challenges and future of IR. The panel members were represented by Stephen Robertson (University College London, UK), Thomas Roelleke (Queen Mary, University of London, UK), ChengXiang Zhai (University of Illinois at Urbana-Champaign, USA), Ricardo Baeza-Yates (Yahoo! Labs, Spain), Peiling Wang (University of Tennessee, USA). Here are the main challenges that the panel members put up.

Small details matter. Thomas Roelleke compared the techniques used in golf playing with information retrieval models. The small details, such as how the player keeps the hands on the golf club, make a big difference in the result of the shot. Similarly, small changes in the theoretical models used in IR produce significant differences in the way the search results are ranked. For example, a small difference in the notation of frequencies in the formulas behind the ranking algorithms can lead to different results in the implementation.

There is a big dependency on text in IR. The way the user interacts with a retrieval system is through text, and current theoretical models in IR are relying heavily on this medium. Text and the tools that allow the user to interact with the search engine (mostly limited to tools such as the keyboard and the mouse) have determined how to build and evaluate the retrieval systems, as opposed to a situation where tools and systems are built based on user needs (also somehow related to the next challenge). Voice is one of the mediums that could become used more often in the future, especially since there is a trend of the user information need being converted into (simply) a user need, and the context of the search becomes more important into identifying the user intent. Emerging examples could be Apple’s Siri or Google’s Glass projects.

Adaptability of users versus adaptability of the system. Should we build systems that adapt to the user, or do the users have the ability to adapt to the systems we build? For example, if the empty string “” is still amongst one of the most common search queries in your search log, what does this say about your users? If we look at this problem from the user’s perspective, the interface provides a colorful button that asks to be pressed for an action to occur. Given this, one interesting questions is how should the system react to an empty string, i.e. what assumptions can be made about the user intent given the empty string?

Specialised search. There are specific cases of search on which a limited amount of research has been done, examples being desktop search and enterprise search, as most of the research being published is based on experiments done for web search. Moreover, theoretical models for these specialised searches are even sparser. Stephen Robertson actually suggests that enterprise search “needs even more theory than web search”. Enterprise search differentiates itself also in the way that evaluation of the performance of the search can be made. It often happens that in the intranet search, a user is able to pinpoint exactly which document is relevant for his search query, which is often not the case when evaluating web search.  

An interesting consequence of this is, if we connect this back to the foundational retrieval models discussed over and over again throughout the conference, that users are most of the times unfamiliar with how the search engine determines the order of the search results. This can create frustration amongst users. So there should be some tradeoff in the details presented to the users – inform them about the assumptions made by the search engine but at the same time hide the small details from the users.


This post has only covered some of the highlights of the conference, and in the upcoming blog posts some of the topics will be covered in more details.