Cloud vs. on-premise SharePoint 2013 search

Search in SharePoint 2013 – Part 1: The difference between search within on-premise SharePoint 2013 and SharePoint Online

Cloud or on-premise? Findwise offers implementation and consulting services for both scenarios. This post is the first in a series of four articles providing several best practices on how to implement and customise search in SharePoint. The focus of this first post is introducing the difference between the cloud and on-premise SharePoint 2013 in terms of search features.

“The cloud is on fire”

That is a quote from the Microsoft Office General Manager Jared Spataro during his keynote at the SharePoint conference in Las Vegas last month. At this conference, Microsoft revealed that 60% of the Fortune top 500 adopted Office 365 in the previous 12 months. While new versions of on-premise SharePoint and Exchange Server are promised to still come next year, Microsoft is adding more and more capabilities to the cloud version.

SPC14 Keynote summary

Fun random facts about SharePoint Online presented during the keynote at the SharePoint conference in Las Vegas this year (March 3rd 2014)

In addition to the numbers above, a market analysis report done by The Radicati Group on the adoption of Microsoft SharePoint reveals that almost a quarter of the worldwide users accessing deployments of SharePoint made during the year 2013 are using the cloud based SharePoint.

When deciding whether to go for the on-premise or cloud solution, a go-to resource for your IT team is the TechNet article describing the availability of features across the solutions. That article not only divides the features between on-premise and cloud, but also between the different Office 365 and SharePoint Online plans. What is the difference? SharePoint Online is the cloud version of the SharePoint Server, but it can be deployed as a standalone service or as part of the Office 365 suite, so different plans are usually listed for these different scenarios. There are also the Office 365 Dedicated plans, but these are out of the scope for this article. The Microsoft Office site has a more business oriented comparison of the different plans, including pricing. If not decided for one or the other, there is also the possibility of a hybrid solution!

 Availability Search feature Office 365 Small BusinessOffice 365 Small Business Premium Office 365 Midsize BusinessOffice 365 Enterprise E1 or K1Office 365 Education A2Office 365 Government G1 or K1 Office 365 Enterprise E3 or E4Office 365 Education A3 or A4Office 365 Government G3 or G4 SharePoint Online Plan 1 SharePoint Online Plan 2 SharePoint Foundation 2013 SharePoint Server 2013 Standard CAL SharePoint Server 2013 Enterprise CAL
Available within all plans
Phonetic name matching Yes Yes Yes Yes Yes Yes Yes Yes
Expertise Search Yes Yes Yes Yes Yes Yes Yes Yes
Quick preview Yes Yes Yes Yes Yes Yes Yes Yes
RESTful Query API/Query OM Yes Yes Yes Yes Yes Yes Yes Yes
Result sources Yes Yes Yes Yes Yes Yes Yes Yes
Search results sorting Yes Yes Yes Yes Yes Yes Yes Yes
Ranking models Yes Yes Yes Yes Yes Yes Yes Yes
Query spelling correction Yes Yes Yes Yes Yes Yes Yes Yes
Refiners Yes Yes Yes Yes Yes Yes Yes Yes
Manage search schema Yes Yes Yes Yes Yes Yes Yes Yes
Available in all Office365 and SharePoint Online plans
Deep links Yes Yes Yes Yes Yes No Yes Yes
Event-based relevancy Yes Yes Yes Yes Yes No Yes Yes
Graphical refiners Yes Yes Yes Yes Yes No Yes Yes
Recommendations Yes Yes Yes Yes Yes No Yes Yes
Search vertical: “Conversations” Yes Yes Yes Yes Yes No Yes Yes
Search vertical: “People” Yes Yes Yes Yes Yes No Yes Yes
Query suggestions Yes Yes Yes Yes Yes No Yes Yes
Query throttling Yes Yes Yes Yes Yes No Yes Yes
“This List” searches Yes Yes Yes Yes Yes No Yes Yes
Query rules—Add promoted results Yes Yes Yes Yes Yes No Yes Yes
Avail. in Office365 Advanced Content Processing Yes Yes Yes No No Yes Yes Yes
Hybrid search No Yes Yes Yes Yes Yes Yes Yes
Query rules—advanced actions No No Yes No No No No Yes
Search vertical: “Video” No No Yes No Yes No No Yes
Not available in any of the Office 365, SharePoint Online plans
Search connector framework No No No No No No Yes Yes
Custom entity extraction No No No No No No No Yes
Extensible content processing No No No No No No No Yes

– Simplified view of the TechNet article, focusing on the search features availability across SharePoint solutions

Limitations in Office 365 and SharePoint Online plans

Is the cloud version good enough for your organisation when it comes to search features? The table above illustrates some of the things that you might be missing in terms of search, and in what follows we will discuss those whose availability varies amongst the Office 365 or SharePoint Online plans.

Query rules – advanced actions

In order to adapt the relevance of the search results to the user intent, SharePoint 2013 adds a new feature called query rules. A query rule is defined by a condition and a corresponding action to be taken when the condition is met. Within some SharePoint Online licenses, this functionality is limited to the possibility of adding promoted results, while more advanced actions are left out. The promoted results are similar to what was in previous SharePoint versions known as search keywords, or best bets, letting you promote specific results on top of the ranked search results. The more advanced actions could consist of for example changing the query or changing the ranking of the search results by promoting a certain group of results. You can read more about various usages of query rules in one of our previous blog post.

Search Connector Framework and Hybrid Search

Administrators of SharePoint Online will miss the feature of managing the different search connectors to content sources, since the search connector framework is not available. Only SharePoint content that is stored online is going to be indexed. Search results can only be retrieved from that content, or can be set up to retrieve from an Exchange Server, from a remote SharePoint, or from a search engine that uses the OpenSearch protocol. As an alternative approach to making content from other sources searchable, you can set up hybrid search. This feature is available in almost all Office 365 and SharePoint Online scenarios. It allows users to show search results from content available in the cloud and on-premise. So if you would like to index a content source that is not supported in SharePoint Online, you should be able to index it on the on-premise.

Custom Entity Extraction

The TechNet article describing features across solutions actually shows that this feature is only available with the enterprise licensing of SharePoint Server. This feature allows the extraction of custom-defined terms from your content and making them usable as search refiners. Say for example that you would like to extract all of your current product names from the content of your documents and then be able to refine your search results on the product name.

Content Processing Extensibility

The other search feature that is only available with the enterprise licensing of SharePoint Server is the content processing extensibility. In practice, this means there is an API that can be used to transform the data before it is stored in the index. For example, more advanced entity extraction can be made at this step. While the custom entity extraction discussed previously is able to identify names in the content based on a pre-defined list of names, through this API you can use a trained model to do entity extraction for example. Additional use cases could be cleaning or normalising the data according to predefined rules (for example, lowercasing all values in a property), or automatically tagging items based on the content.

It should be noted that the TechNet article is not a comprehensive list, and rather gives an overview of the major differences between solutions. Here is for example one more feature whose availability is limited.


One of the missing features in SharePoint Online that is available in the on-premise solution is the possibility of defining synonyms. Since it’s too easy to communicate the same thing with different words, defining synonyms or abbreviations for search phrases can help aggregate the results for the multiple ways of expressing the same information need. We hope that Microsoft will integrate this feature in the future versions of SharePoint Online as well.

Find the right documentation

When searching for which functionality is available across solutions on the Microsoft website or TechNet, make sure to check that the discussed functionality applies to your version of SharePoint. Articles usually indicate for which versions the functionality applies to.

Feature availability in MS articles

Articles on (left) and TechNet (right) indicate for which version
of SharePoint the discussed topic applies to.

Please note that things might change, new updates in SharePoint online can add functionality that was missing before. To stay up-to-date, check the TechNet page once in a while, or contact us to help you map your requirements to the available search features across solutions.

Event driven indexing for SharePoint 2013

In a previous post, we have explained the continuous crawl, a new feature in SharePoint 2013 that overcomes previous limitations of the incremental crawl by closing the gap between the time when a document is updated and when the change is visible in search. A different concept in this area is event driven indexing.

Content pull vs. content push

In the case of event driven indexing, the index is updated real-time as an item is added or changed. The event of updating the item triggers the actual indexing of that item, i.e. pushes the content to the index. Similarly, deleting an item results in deleting the item from the index immediately, making it unavailable from the search results.

The three types of crawl available in SharePoint 2013, the full, incremental and continuous crawl are all using the opposing method, of pulling content. This action would be initiated by the user or automated to start at a specified time or time intervals.

The following image outlines the two scenarios: the first one illustrates crawling content on demand (as it is done for the full, incremental and continuous crawls) and the second one illustrates event-driven indexing (immediately pushing content to the index on an update).

Pulling vs pushing content, showing the advantage of event driven indexing

Pulling vs pushing content

Example use cases

The following examples are only some of the use cases where an event-driven push connector can make a big difference in terms of the time until the users can access new content or newest versions of existing content:

  • Be alerted instantly when an item of interest is added in SharePoint by another user.
  • Want deleted content to immediately be removed from search.
  • Avoid annoying situations when adding or updating a document to SharePoint and not being able to find it in search.
  • View real-time calculations and dashboards based on your content.

Findwise SharePoint Push connector

Findwise has developed for its SharePoint customers a connector that is able to do event driven indexing of SharePoint content. After installing the connector, a full crawl of the content is required after which all the updates will be instantly available in search. The only delay between the time a document is updated and when it becomes available in search is reduced to the time it takes for a document to be processed (that is, to be converted from what you see to a corresponding representation in the search index).

Both FAST ESP and Fast Search for SharePoint 2010 (FS4SP) allow for pushing content to the index, however this capability was removed from SharePoint 2013. This means that even though we can capture changes to content in real time, we are missing the interface for sending the update to the search index. This might be a game changer for you if you want to use SharePoint 2013 and take advantage of the event driven indexing, since it actually means you would have to use another search engine, that has an interface for pushing content to the index. We have ourselves used a free open source search engine for this purpose. By sending the search index outside the SharePoint environment, the search can be integrated with other enterprise platforms, opening up possibilities for connecting different systems together by search. Findwise would assist you with choosing the right tools to get the desired search solution.

Another aspect of event driven indexing is that it limits the resources required to traverse a SharePoint instance. Instead of continuously having an ongoing process that looks for changes, those changes come automatically when they occur, limiting the work required to get that change. This is an important aspect, since the resources demand for an updated index can be at times very high in SharePoint installations.

There is also a downside to consider when working with push driven indexing. It is more difficult to keep a state of the index in case problems occur. For example, if one of the components of the connector goes down and no pushed data is received during a time interval, it becomes more difficult to follow up on what went missing. To catch the data that was added or updated during the down period, a full crawl needs to be run. Catching deletes is solved by either keeping a state of the current indexed data, or comparing it with the actual search engine index during the full crawl. Findwise has worked extensively on choosing reliable components with a high focus on robustness and stability.

The push connector was used in projects with both SharePoint 2010 and 2013 and tested with SharePoint 2007 internally. Unfortunately, SharePoint 2007 has a limited set of event receivers which limits the possibility of pure event driven indexing. Also, at the moment the connector cannot be used with SharePoint Online.

You will probably be able to add a few more examples to the use cases for event driven indexing listed in this post. Let us know what you think! And get in touch with us if you are interested in finding more about the benefits and implications of event driven indexing and learn about how to reach the next level of findability.

Continuous crawl in SharePoint 2013

Continuous crawl is one of the new features that comes with SharePoint 2013. As an alternative to incremental crawl, it promises to improve the freshness of the search results. That is, the time between when an item is updated in SharePoint by a user and when it becomes available in search.

Understanding how this new functionality works is especially important for SharePoint implementations where content changes often and/or where it’s a requirement that the content should instantly be searchable. Nonetheless, since many of the new SharePoint 2013 functionalities depend on search (see the social features, the popular items, or the content by search web parts), understanding continuous crawl and planning accordingly can help level the user expectation with the technical capabilities of the search engine.

Both the incremental crawl and the continuous crawl look for items that were added, changed or deleted since the last successful crawl, and update the index accordingly. However, the continuous crawl overcomes the limitation of the incremental crawl, since multiple continuous crawls can run at the same time. Previously, an incremental crawl would start only after the previous incremental crawl had finished.

Limitation to content sources

Content not stored in SharePoint will not benefit from this new feature. Continuous crawls apply only to SharePoint sites, which means that if you are planning to index other content sources (such as File Shares or Exchange folders) your options are restricted to incremental and full crawl only.

Example scenario

The image below shows two situations. In the image on the left (Scenario 1), we are showing a scenario where incremental crawls are scheduled to start at each 15 minutes. In the image on the right (Scenario 2), we are showing a similar scenario where continuous crawls are scheduled at each 15 minutes. After around 7 minutes from starting the crawl, a user is updating a document. Let’s also assume that in this case passing through all the items to check for updates would take 44 minutes.

Continuous crawl SharePoint 2013

Incremental vs continuous crawl in SharePoint 2013

In Scenario 1, although incremental crawls are scheduled at each 15 minutes, a new incremental crawl cannot be started while there is a running incremental crawl. The next incremental crawl will only start after the current one is finished. This means 44 minutes for the first incremental crawl to finish in this scenario, after which the next incremental crawl kicks in and finds the updated document and send it to the search index. This scenario shows that it could take around 45 minutes from the time the document was updated until it is available in search.

In Scenario 2, a new continuous crawl will start at each 15 minutes, as multiple continuous crawls can run in parallel. The second continuous crawl will see the updated document and send it to the search index. By using the continuous crawl in this case, we have reduced the time it takes for a document to be available in search from around 45 minutes to 15 minutes.

Not enabled by default

Continuous crawls are not enabled by default and enabling them is done from the same place as for the incremental crawl, from the Central Administration, from Search Service Application, per content source. The interval in minutes at which a continuous crawl will start is set to a default of 15 minutes, but it can be changed through PowerShell to a minimum of 1 minute if required. Lowering the interval will however increase the load on the server. Another number to take into consideration is the maximum number of simultaneous requests, and this is a configuration that is done again from the Central Administration.

Continuous crawl in Office 365

Unlike in SharePoint 2013 Server, continuous crawls are enabled in SharePoint Online by default but are managed by Microsoft. For those used to the Central Administration from the on-premise SharePoint server, it might sound surprising that this is not available in SharePoint Online. Instead, there is a limited set of administrative features. Most of the search features can be managed from this administrative interface, though the ability to manage the crawling on content sources is missing.

The continuous crawl for Office 365 is limited in the lack of control and configuration. The crawl frequency cannot be modified, but Microsoft targets between 15 minutes and one hour between a change and its availability in the search results, though in some cases it can take hours.

Closer to real-time indexing

The continuous crawl in SharePoint 2013 overcomes previous limitations of the incremental crawl by closing the gap between the time when a document is updated and when this is visible in the search index.

A different concept in this area is the event driven indexing, which we will explain in our next blog post. Stay tuned!

Entity Recognition with Google Search Appliance 7.2


In this article we would like to present some of the possibilities offered by the entity recognition option of Google Search Appliance (GSA). Entity recognition was introduced with the release of version 7.0 and improvements will still be added in future releases. We have used version 7.2 to write this blogpost and illustrate how GSA can perform named-entity recognition and sentiment analysis.

Entity Recognition in brief

Entity recognition enables the GSA to discover entities (such as names of people, places, organizations, products, dates, etc.) in documents where these are not available in the Metadata or in general, may be needed in order to enhance the search experience (e.g. via faceted search/dynamic navigation). There are three ways of defining entities:

  • With a TXT format dictionary of entities, where each entity type is in a separate file.
  • With an XML format dictionary, where entities are defined by synonyms and regular expressions. Currently, the regular expressions only match single words.
  • With composite entities written as an LL1 grammar.

Example 1: Identifying people

The basic setup for recognition of person names is to upload a dictionary of first names and a dictionary of surnames. Then, you can create a composite entity full name by using a simple LL1 grammar rule, for example {fullname}::=[firstname] [surname]. Every first name in your dictionary, followed by a space and then followed by a surname will be recognized as a full name. With the same approach, you can define more complex full names such as:

{fullName}::= {Title set}{Name set}{Middlenames}{Surname set}
{Title set}::=[Title] {Title set}
{Title set} ::= [epsilon]
{Name set} ::= [Name] {Name set2}
{Name set2} ::= [Name] {Name set2}
{Name set2} ::= [epsilon]
{Middlenames} ::= [Middlename]
{Middlenames} ::= [epsilon]
{Surname set} ::= [Surname] {Surname set2}
{Surname set2} ::= [Surname] {Surname set2}
{Surname set2} ::= [epsilon]

A full name will be recognized if it matches 0 or 1 instances of a title, one or more first names, 0 or 1 middle names and one or more surnames, all separated with a space. (e.g.: Dr John Anders Lee).


  • All the names in the content will be matched
  • Common words similar to names will be matched. Example: Charlotte Stone. To reduce this limitation, you can enable the case sensitive option and match a full name
  • In the preceding example, Dr John Anders Lee and John Anders Lee will be recognized as a different person
  • No support for multiple entities within composite entities. John Anders Lee will be matched as a full name, but John will not be matched as a name.


Example 2: Identifying places

Place names such as cities, countries, streets can be easily defined with the help of dictionaries in TXT format. One can also define locations by using regular expressions, especially if these share the same substring (e.g. “street” or “square”). For example, a Swedish street will often contain the substring “gata”, meaning “street”:

<name> Street </name>

This will allow us to identify one-word places like “Storgatan“, “Järntorget” but will fail in cases where we have 2 or more words in the name such as “Olof Palmes plats”.

Swedish postal codes can be defined with a regex matching 5 digits. Note, however, that all numbers of 5 digits will be matched as a postal code and that you cannot define space in the postal code due to the regular expression limitation of the GSA only matching a single word.

You can use the synonyms function of the xml dictionary to link a postal code with a city.

<name> Göteborg </name>

40330, 40510, 41190 and 41302 will be recognized as the entity Göteborg.

You can also use the synonyms to describe a territory division (kommun, län, country).

     <name> Göteborg Stad</name> 
     <term> Angered </term>
     <term> Backa </term>
     <term> Göteborg </term>
     <term> Torslanda </term>
     <term> Västra Frölunda </term>
     <name> Öckerö </name>
     <term> Hönö </term>
     <term> Öckerö </term> 
     <term> Rörö </term>



Example 3: Sentiment analysis

Sentiment analysis aims at identifying the predominant mood (happy/sad, anger/happiness, positive/negative, etc) of a document by analyzing its content. Here we will show you a simple case of identifying positive vs negative mood in a document.

Basic analysis

For a basic analysis one can create two dictionaries, one with positive words (good, fine, excellent, like, love …) and one with negative words (bad, dislike, don’t, not …). Such an analysis is simplistic and very limited for the following reasons:

• There is no real grammar
• Limited coverage of the lexicons
• No degree of judgment
• No global analysis of the document (if a document has 3 different polarity words it will be tagged with 3 different categories)

Screen Shot 2014-04-08 at 11.19.31

Analysis with grammar

If you add a dictionary of negations, you can create a more powerful tool with just a small grammar of compose entities. For example, {en negative} ::= [en negation] [en positive word] will correctly identify the English “not good”, “don’t like”, “didn’t succeed”  as negative terms. One can certainly create deeper analysis with more advanced grammar. Thus you can  specify special dictionaries for gender, emphatic words, nouns, verbs, adjectives,etc and build composite entities, and grammar rules with them. Below you see an example of the application of a simple grammar.

Screen Shot 2014-04-08 at 11.36.46

Degrees of sentiment

You can also add some degrees in the sentiments using the synonyms feature.

  <name> Good </name>
  <term> good </term>
  <term> fine </term>
  <term> like </term>
  <name> Very Good </name>
  <term> excellent </term>
  <term> amazing </term>
  <term> great </term>
  <name> Bad </name>
  <term> bad </term>
  <term> dislike </term>
  <term> don’t </term>
  <term> can’t </term>
  <term> not </term>
  <name> Very Bad </name>
  <term> awful </term>
  <term> hate </term>

Note, however that you cannot combine such synonym entries with other entity dictionaries or grammar rules.

Screen Shot 2014-04-08 at 12.00.16


There are some limitations of this approach as well:

  • No possibility to extract global sentiment for a given document. You cannot count in a document how many terms are matched as good and how many are matched as bad and then define the global sentiment for this document. However, when the regular expression limitations are fixed, one will be able to do so.
  • As with sentiment analysis in general and other dictionary-based approaches it is hard to discover sarcasm and irony.


In this blog post we showed how one can use the Entity recognition feature of GSA 7.2. While there are still some limitations of the tools provided, they are mature enough to enhance your search solution. Depending on the type of data, one can do simple sentiment analysis as well as more complex recognition of entities by using LL1 grammar.

A nice add-on to the Entity recognition setup in the GSA would be the possibility to load pre-trained models for Named Entity Recognition or sentiment analysis.


Entity recognition with GSA:

Dynamic navigation:

Reaching Findability #6

Findability is surprisingly complex due to the large number of measures needed to be understood and undertaken. I believe that one of the principal challenges lies within the pedagogical domain. This is my sixth and last post in a series of simple tips for reaching Findability.

Understand your business case!

Many decisions are based on gut-feeling rather than solid business cases. Improving Findability is often a good investment! Being able to back that statement up with numbers makes it easier to move forward with the decision.

For some applications of search, it is relatively easy to show the benefits in numbers. Improving Findability on a web page can increase conversion rates and online sales, or decrease the need for customer support to mention a few examples.

It is however, more difficult to put numbers on internal applications of search. Finding information quicker and easier obviously saves time, but is time always worth money? I believe the first and foremost benefit may be that decisions can be made based on better information.

If finding relevant information is made quick and easy enough, the will for actually spending time using for decision-making will increase. Search can give you a better overview of what information is available, provide you with information you didn’t know existed or previously didn’t have access to. And it can serve information you might be interested in because of your position and context.

A solid business case for working with search is more or less difficult to find. That does not however, remove the need for working on it. One benefit is keeping focus on what really matters to your organization. Another is convincing others that new ways of working with information, as a strategic asset, matters. Fortunately, there are good methods to find, express and measure the benefits of Findability to make your business case solid!

For some inspiration and insight into the general state of findability you can download and read the report from our global survey, Enterprise Search & Findability Survey 2013.

If you are interested in more details about how to achieve Findability you can also download and read our whitepaper “Best practices for search”!

How relevance models work

A relevance model is what a search engine uses to rank documents in a search result, i.e. how it finds the document you are looking for. An axiomatic analysis of relevance models is asking the questions: how and why does a relevance model work? Findwise attended the ICTIR 2013 conference in Copenhagen where one of the recurring topics was the axiomatic analysis of relevance models.

The relevance model is represented through a mathematical function of a set of input variables, and therefore just by looking at its formula it is likely to be very difficult to answer those two questions. What the axiomatic analysis aims to do is to break down the formulas and to isolate and analyze each of its individual components, with the goal of making improvements in the performance.

The idea is to formulate a set of axioms, meaning laws that a relevance model should abide by. One of the more obvious axioms, from a purely statistical point of view, relates to term frequency (TF): a document d1, where the terms of the query occur more times than in some other document d2, is to be assigned a higher relevance than d2. These are called axioms because they should be relevance truths – statements that are obvious and that everyone can agree on. Other examples of axioms could be that very long documents should be penalized simply because they have a higher probability to contain any word, and that terms frequent in many documents should contribute less to the relevance than terms that are more unique.

From an Enterprise Search perspective, these axioms do not have to be general relevance truths, but more adapted to your organization and your users. Here we see a shift in the type of axioms from pure statistics-based towards more metadata-based, e.g. which fields are more relevant than others and which sources are more relevant. A very simple example of this is that a hit in the title is more relevant than a hit in the body. These are usually conveniently configurable in most search engines, e.g. Apache Solr.

This concept is useful and interesting for many reasons since it not only allows you to modify and improve existing relevance models but you can also create new ones from scratch. This process can also be automated using Machine Learning algorithms, which leaves us with the task of finding the optimal set of axioms. Can you think of axioms that can be applied to your organization, your users informational needs and the content that is made searchable?

Reaching Findability #5

Findability is surprisingly complex due to the large number of measures needed to be understood and undertaken. I believe that one of the principal challenges lies within the pedagogical domain. This is my fifth post in a series of simple tips for reaching Findability.

Effect driven development!

Most projects are undertaken to achieve operational improvements in an organization. An often relied upon truth is that IT driven, or what I’d like to call “feature driven”, projects are more likely to fail than projects with a clear focus on business benefits. Many pitfalls can be avoided by using a structured approach while making business benefits the centre of attention.

An excellent tool to keep focus on satisfying real business needs, rather than developing features only few will use, is Effect Mapping developed by InUse AB. The basic idea in this method is defining what actual business effects should result from a project, as well as identifying target groups contributing to these effects and their specific needs. It is all visualized in a so-called Effect Map.

The Effect Map is a great tool for communicating goals and means needed to achieve them. It can be used throughout an entire project to manage changing requirements, keep the business in focus and prioritize what really matters; all while new knowledge is gained.

Development of an Effect Map takes place during a series of workshops, providing valuable insights to any project. It is especially helpful in Findability projects, as they span many different parts of any business, from managing information and the organization to technical solutions.

A good strategy used in conjunction with an Effect Map and long-term goals provides you with a great starting point for succeeding with your Findability. You can read more about different approaches to building the business case in this blog post!

Swedish language support (natural language processing) for IBM Content Analytics (ICA)

Findwise has now extended the NLP (natural language processing) in ICA to include both support for Swedish PoS tagging and Swedish sentiment analysis.

IBM Content Analytics with Enterprise Search (ICA) has its strength in natural language processing (NLP) which is achieved in the UIMA pipeline. From a Swedish perspective, one concern with ICA has always been its lack of NLP for Swedish. Previously the Swedish support in ICA consisted only of dictionary-based lemmatization (word: “sprang” -> lemma: “springa”). However, for a number of other languages ICA has also provided part of speech (PoS) tagging and sentiment analysis. One of the benefits of the PoS tagger is its ability to disambiguate words, which belong to multiple classes (e.g. “run” can be both a noun and a verb) as well as assign tags to words, which are not found in the dictionary. Furthermore, the POS tagger is crucial when it comes to improving entity extraction, which is important when a deeper understanding of the indexed text is needed.

Findwise has now extended the NLP in ICA to include both support for Swedish PoS tagging and Swedish sentiment analysis. The two images below shows simple examples of the PoS support.

Example when ICA uses NLP to analyse the string "ICA är en produkt som klarar entitetsextrahering"Example when ICA uses NLP to analyse the string "Watson deltog i jeopardy"

The question is how this extended functionality could be used?

IBM uses ICA and its NLP support together with several of their products. The jeopardy playing computer Watson may be the most famous example, even if it is not a real product. Watson used NLP in its UIMA pipeline when it analyzed its data from sources such as Wikipedia and Imdb.

One product which leverage from ICA and its NLP capabilities is Content and Predictive Analytics for Healthcare. This product helps doctors to determine which action to take for a patient given the patient’s journal and the symptoms. By also leveraging the predictive analytics from SPSS it is possible to suggest the next action for the patient.

ICA can also be connected directly to IBM Cognos or SPSS where ICA is the tool which creates structure to unstructured data. By using the NLP or sentiment analytics in ICA, structured data can be extracted from text documents. This data can then be fed to IBM Cognos, SPSS or non IBM products such as Splunk.

ICA can also be used on its own as a text miner or a search platform, but in many cases ICA delivers its maximum value together with other products. ICA is a product which helps enriching data by creating structure to unstructured data. The processed data can then be used by other products which normally work with structured data.

Speaking at the Agile HR Conference in Stockholm

In my role as process coordinator for Talent Management I will be speaking at the Agile HR Conference in Stockholm on November 27. My talk will focus on our approach to Talent Management and the tools we use. Findwise continously strive to improve the Talent Management process with the ambition to attract, develop, engage and retain excellent talents. The work during 2013 has resulted in a new version of our Talent Management process and the tools we use.

Findwise speaking at Agile HR

Our view on talent management is this; Talent attract talent. To find new talented people we must make the ones who are already working here thrive and feel inspired. How do we accomplish this? Well, talented people must be respected as equals and be given the freedom to create and innovate. You don’t hire a talent to tell him or her how to do what they are talented at. That would be like hiring Michael Jackson and tell him how to write a hit song. We want our people to feel encouraged to act independently and bravely, that is how their talents best are put to use for Findwise and our clients.

Hope to see you at the conference!

Intranets that have an impact

Recently I attended Euroia, the European information architecture summit, where experts within the area meet up to discuss, share, listen and learn.

For me, one of the highlights was James Robertson from Step Two Designs, presenting some of the results from their yearly intranet awards. Intranets are fascinating in being large systems with such potential to improve daily work. However, more often than not they fail in doing so. As James Robertson put it “organizations and intranets is the place where user experience goes to die”.  So, what can we do to change that?

Robertson talked about successful companies managing to create structured, social and smart intranets. Two examples were the International Monetary Fund and a Canadian law firm. Both needed easy and secure gathering and retrieval of large amounts of information. Part of their success came from mandatory classification of published documents and review of changes. Another smart solution was to keep a connection between parent documents and their derivatives, making sure that information was trustworthy and kept up to date.

Companies that excelled at social managed to bind everything together; people projects and customers. I was happy to hear this, as we have been working a lot on this at Findwise. Our latest internal project was actually creating our own knowledge graph, connecting skills, platforms and technologies with projects and customers. What we haven’t done yet but other successful companies have, is daring to go all in with social. Instead of providing social functionality on the side, they fully integrate their social feed into the intranet start page. This I’d like to try at Findwise.

The ugliest but smartest solution presented by James, combined analytics with proper tagging of information. Imagine the following; a policy is changed and you are informed. However, you don’t need the policy until you perform a task months later. Now, the policy information is hidden in a news archive and you can’t easily find it. Annoying right?

What CRS Australia does to solve this problem is simple and elegant. They track pages users visit on the intranet. Whenever someone updates a page they enter whether it is a significant change or not. This is combined with electronic forms for everything. When filling in a form, information regarding policy updates pop up automatically, ensuring that users always have up to date information.

These ideas give me hope and clearly show that intranets needn’t be a place where user experience comes to die.