Text/Content Analytics 2011: User Perspectives on Solutions and Providers
1. Text/Content Analytics 2011:
User Perspectives on
Solutions and Providers
Seth Grimes
An Alta Plana research study
Sponsored by
Published September 9, 2011 under the Creative Commons Attribution 3.0 License.
2. Text/Content Analytics 2011: User Perspectives
Table of Contents
Executive Summary ............................................................................................................................................ 3
Market Size and Growth............................................................................................................................. 3
Growth Drivers ........................................................................................................................................... 3
The 2011 Market ........................................................................................................................................ 4
The Study.................................................................................................................................................... 4
Key Study Findings...................................................................................................................................... 4
About the Study and the Report ................................................................................................................ 5
Text and Content Analytics Basics ...................................................................................................................... 6
From Patterns… .......................................................................................................................................... 6
… To Structure ............................................................................................................................................ 7
Beyond Text................................................................................................................................................ 7
Metadata .................................................................................................................................................... 7
A Focus on Applications ............................................................................................................................. 7
Applications and Markets .................................................................................................................................. 8
Application modes...................................................................................................................................... 8
Business Domains ....................................................................................................................................... 8
Business Functions ..................................................................................................................................... 9
Technology Domains ................................................................................................................................ 10
Solution Providers .................................................................................................................................... 12
Demand-Side Perspectives ............................................................................................................................... 13
Study Context ........................................................................................................................................... 13
About the Survey ...................................................................................................................................... 13
Market Size and the Larger BI Market ...................................................................................................... 15
The Data Mining Community ................................................................................................................... 16
Demand-Side Study 2011: Findings .................................................................................................................. 17
Q1: Length of Experience ......................................................................................................................... 17
Q2: Application Areas ............................................................................................................................... 18
Q3: Information Sources .......................................................................................................................... 19
Q4: Return on Investment ........................................................................................................................ 21
Q5: Mindshare.......................................................................................................................................... 22
Q6: Spending ............................................................................................................................................ 23
Q8: Satisfaction ........................................................................................................................................ 23
Q9: Overall Experience ............................................................................................................................. 25
Q10: Providers .......................................................................................................................................... 28
Q11: Provider Selection ............................................................................................................................ 29
Q13: Promoter? ........................................................................................................................................ 31
Q14: Information Types ........................................................................................................................... 32
Q15: Important Properties and Capabilities ............................................................................................ 32
Q16: Languages ........................................................................................................................................ 34
Q17: BI Software Use ............................................................................................................................... 35
Q18: Guidance .......................................................................................................................................... 36
Q19: Comments ....................................................................................................................................... 39
Additional Analysis ................................................................................................................................... 40
Interpretive Limitations and Judgments .................................................................................................. 42
About the Study ............................................................................................................................................... 43
Solution Profile: AlchemyAPI ............................................................................................................................ 45
Solution Profile: Attensity ................................................................................................................................. 47
Solution Profile: Basis Technology .................................................................................................................... 49
Solution Profile: Language Computer Corp. ..................................................................................................... 51
Solution Profile: Lexalytics ................................................................................................................................ 53
Solution Profile: Medallia ................................................................................................................................. 55
Solution Profile: SAS ......................................................................................................................................... 57
Solution Profile: Sybase .................................................................................................................................... 59
Solution Profile: Verint Systems Inc.................................................................................................................. 61
2
3. Text/Content Analytics 2011: User Perspectives
Executive Summary
Text and content analytics have become a source of competitive advantage,
enabling businesses, government agencies, and researchers to extract
unprecedented value from “unstructured” data. Uptake is strong – software,
solutions, and services are delivering significant business value to users in a
spectrum of industries – yet the potential of the market remains unreached.
These points and more are brought out in Alta Plana’s market study,
“Text/Content Analytics 2011: User Perspectives on Solutions and Providers.”
Market Size and Growth
Tools and solutions now cover the gamut of business, research, and governmental
needs. User adoption continues to grow at a very rapid pace, an estimated 25% in 2010,
creating an $835 million market for software tools, business solutions, and vendor
supplied support and services. These tools and solutions generate business value several
times that figure, extrapolating from revenue generated by applications and solutions (for
instance, social-media analysis, e-discovery, and search), information products created by
mining content, professional services, and research.
The addressable market for text/content analytics is much larger. The technologies are a
subset of a larger business intelligence, analytics, and performance management software
market, which is dominated by solutions that analyze numerical data that originates in
enterprise operational systems. Gartner estimated that larger market at $10.5 billion
globally in 2010. Yet, given now-broad awareness of the business value that resides in
“unstructured” social, online, and enterprise sources, text/content-analytics’ share of the
much larger market will surely grow steeply in coming years. Overall, expect annual
text/content-analytics growth averaging up to 25% for the next several years.
Growth Drivers
A number of factors contribute to sustained growth, foremost the growth of social
platforms, which have become essential life tools for individuals and an important
business marketing, communication, research, and commerce channel.
Social
Keeping up with Social is a must for every consumer-facing organization, and automated
monitoring, measurement, and engagement is the only way to deal with Social’s variety,
volume, and velocity. Leading solutions rely on natural-language processing, provided
by text/content analytics, to identify and extract facts and sentiment. Expect even
lower-end tools to embrace NLP by 2013.
Publishing, advertising, and information services
Second, text/content analytics is central to competitive online publishing and
advertising to effective information access (essentially, next-generation search). These
are two sides of a single coin. As applied by content producers and publishers,
technologies discover and associate appropriate descriptive and semantic labels with
content. The aims are to optimize search findability, to allow content to be stored and
retrieved at a fine-grained level (documents as databases), and to enhance the content
consumer’s experience interacting with content. As applied by search, content
aggregation, online advertising, and information-service providers, the technology fuels
situationally appropriate results that respond to the information/service seeker’s context
and intent.
3
4. Text/Content Analytics 2011: User Perspectives
Question-answering and information access
Question-answering systems such as IBM Watson and Wolfram Alpha are examples of
next-generation, analytics-enabled information-access engines, which will play a key role
in online commerce, customer support, health-service delivery, and other applications
starting by early 2013. Similarly, Semantic Web information resources should finally enter
the mainstream by 2014. They will very frequently rely on analytics to semanticize and
structure content and support on-the-fly information integration.
Rich media
Last, content analytics makes sense of rich media. The technology finds and exploits
patterns – what’s in a given piece of content and how the content of content changes
over time – in speech and sound, images, and video. There are important today content-
analytics applications for contact centers, security, general information access, and even
in consumer electronics: Witness face detection and tracking in consumer-grade cameras
and camcorders. Arguably, we could include analyses of social and enterprise network,
mined from e-mail, messaging, online, and social content, under the content-analytics
umbrella.
The 2011 Market
As in prior years, no single solution provider dominates the market. Players range from
the largest enterprise software vendors to a stream of new entrants, both
commercializing research technologies and bringing solutions to new markets. In
between, established enterprise content management (ECM), BI and analytics, search,
software tools, and business-solution providers – the sponsors of this study among them –
continue to innovate and deliver business value.
The Study
Alta Plana’s 2011 text/content analytics market study combines a survey-based,
quantitative and qualitative examination of usage, perceptions, and plans with
observations derived from numerous conversations with solution providers and users. It
seeks to answer the question, “What do current and prospective text/content-analytics
users really think of the technology, solutions, and solution providers?” Responses will
help providers craft products and services that better serve users. Findings will guide
users seeking to maximize benefit for their own organizations.
Alta Plana received 224 valid survey responses between June 6 and July 9, 2011. This
document reports findings and when appropriate, contrasts them with comparable
numbers from Alta Plana’s spring-2009 text-analytics market study.1
Key Study Findings
The following are key 2011 study findings:
The big news is not news at all: Social is by far the most popular source fueling
text/content analytics initiatives. Four of the top 5 information categories are
social/online (as opposed to in-enterprise) sources:
o blogs and other social media (62%)
o news articles (41%)
o on-line forums (35%)
o reviews (30%)
1
“Text Analytics 2009: User Perspectives on Solutions and Providers”: http://altaplana.com/TA2009
4
5. Text/Content Analytics 2011: User Perspectives
as well as direct customer feedback in the form of:
o customer/market surveys (35%)
o e-mail and correspondence (29%)
for an average of 4.5 sources per respondent.
All three top capabilities that users look for in a solution, each garnering over 50%
response, relate to getting the most information out of sources:
o Broad information extraction capabilities (63%)
o Ability to use specialized dictionaries, taxonomies, ontologies, or
extraction rules (57%)
o Deep sentiment/emotion/opinion extraction (57%)
Low cost dropped from 51% of 2009 responses to 38% in 2011.
Top business applications of text/content analytics for respondents are the
following:
o Brand / product / reputation management (39% of respondents)
o Voice of the Customer / Customer Experience Management (39%)
o Search, Information Access, or questions Answering (36%)
o Competitive intelligence (33%)
Seventy percent of users are Satisfied or Completely Satisfied with text/content
analytics and 24% are Neutral with only 7% Disappointed or Very Disappointed.
Dissatisfaction is greatest, at 25%, with ease of use, with only 36% satisfied. Only
42% are satisfied with availability of professional services/support.
Only 49% of users are likely to recommend their most important provider. 28%
would recommend against their most important provider.
About the Study and the Report
Seth Grimes, an industry analyst and consultant who is a recognized authority on the
application of text analytics, designed and conducted the study “Text/Content Analytics
2011: User Perspectives on Solutions and Providers” and wrote this report.
The author is grateful for the support of the nine study sponsors, Verint, Sybase, SAS,
Medallia, Lexalytics, Language Computer Corporation, Basis Technology, Attensity, and
AlchemyAPI. Their sponsorships allowed him to conduct an editorially independent study
that should promote understanding of the text/content analytics market and of user-
indicated implementation and operations best practices. The solution profiles that follow
the report’s editorial matter were provided by the sponsors and included with only minor
editing for to regularize their layout. Otherwise, the author is solely responsible for the
editorial content of this report, which was not reviewed by the sponsors prior to
publication.
5
6. Text/Content Analytics 2011: User Perspectives
Text and Content Analytics Basics
The term text analytics describes software and transformational processes that uncover
business value in “unstructured” text via the application of statistical, linguistic,
machine learning, and data analysis and visualization techniques. The aim is to improve
automated text processing, whether for search, classification, data and opinion
extraction, business intelligence, or other purposes.
Rough synonyms include text mining, text ETL, and semantic analysis. Terminology
choices are typically rooted in history and competitive positioning. Text mining is an
extension of data mining and text ETL of the BI world’s extract-transform-load concept.
Semantic analysis seems most often used by Semantic Web aficionados, who sometimes
use the broader term Semantic Web technologies, which also covers protocols such as
RDF, triple stores, query systems, and the like.
These text technologies all perform some form of natural language processing (NLP).
Content analytics can and should be seen as an extension of capabilities to also cover
images, audio and speech, video, and composites, the gamut of information types not
generated or held in data fields. (Some organizations use the content analytics label for
text analytics on online, social, and enterprise content, typically, published information.
These organizations most often have a strong focus on enterprise content management
(ECM) systems.)
From Patterns…
Text, images, speech and other audio, and video are all directly understandable by
humans (although not universally: Any given human language – English, Japanese, or
Swahili – is spoken by a minority of people, and not everyone recognizes a Beethoven
symphony or Nelson Mandela in a photo). Understanding relies on three capabilities:
1) Ability to recognize small- and large-scale patterns.
2) Ability to grasp context and, from context, to infer meaning.
3) Ability to create and apply models.
Descriptive statistics provides an NLP starting point: The most frequently used words and
terms give an indication of the topics a message or document is about. We can create
categories and classify text (a form of modeling) based on notions of statistical similarity.
Next steps take advantage of the linguistic structure of text, detectable by machines as
patterns. We have word form (“morphology”) and arrangement (grammar and syntax) as
well as higher-level narrative and discourse. Usage may be correct (as judged by editors,
grammarians, and linguists) or not, whether the language is spoken, formally written, or
texted or tweeted: The most robust technologies deal with text in the wild. We apply
assets such as lexicons of “named entities”; part-of-speech resolution that can help
identify subject, object, relationship, and attributes; and “word nets” that associate words
to help in disambiguation, determination of the contextual sense of terms that may have
different meanings in different contexts.
Yet, in the words of artificial-intelligence pioneer Edward A. Feigenbaum,
“Reading from text in general is a hard problem, because it involves all of
common sense knowledge. But reading from text in structured domains, I
don’t think is as hard.”
So some techniques (also) apply knowledge representations such as ontologies to the
analysis task. All techniques, however, aim to generate machine-processable structure.
6
7. Text/Content Analytics 2011: User Perspectives
… To Structure
NLP outputs, as part of a text-analytics system, are typically expressed in the form of
document annotations, that is, in-line or external tags that identify and describe features
of interest. Outputs may be mapped into machine-manageable data structures whether
relational database records or in XML, JSON, RDF, or another format.
Text-extracted data represented in the Semantic Web’s Resource Description Framework
(RDF) may form part of a Linked Data system. Text-derived information stored in a
relational database may become part of a business intelligence system that jointly
analyzes, for instance, DBMS-captured customer transactions and free-text responses to
customer-satisfaction surveys. And text-extracted features such as entities, topics, dates,
and measurement units may form the basis of advanced semantic search systems.
Beyond Text
Beyond-text technologies for information-extraction from images, audio, video, and
composite media exist but do not match NLP’s sense-making capabilities. Likely most
developed is speech-analysis technology that supports indexing and search using
phonemes and is capable of detecting emotion in speech via analysis of indicators such as
pace, volume, and intonation with contact-center and others applications that include
intelligence. Intelligence, along with consumer and social search, motivates work on
image analysis, as do marketing and competitive-intelligence related studies of online and
social brand mentions and use. Video analytics extends both speech and image analysis,
with an added temporal aspect, for security applications and also potential business uses
such as study of customer in-store behavior.
For beyond-text media, as for text, metadata is of critical importance.
Metadata
Metadata describes data properties that may include the provenance, structure, content,
and use of data points, datasets, documents, and document collections. Content-linked
metadata typically includes author, production and modification dates, title, topic(s),
keywords, format, language, encoding (e.g., character set), rights, and so on. The
metadata label extends to specialized annotations such as part-of-speech and data type.
Metadata may be created as part of content production or publication (for instance, the
save date captured by a word-processor, a geotag associated with a social update, camera
information stored in an image file). It may be appended (for instance via social tagging),
or extracted from content via text/content analysis. Whether stored internally within a
data object (for instance via RDFa, FOAF, or other microformats embedded in a Web page)
or managed externally, in a database or search index, metadata is fuel for a range of
applications.
A Focus on Applications
We will not devote further space in this report to discussion of text- and content-analysis
technology. If you do want to learn more about text-analytics history and technology, do
continue with the technology sections of Alta Plana’s 2009 study report, “Text Analytics
2009: User Perspectives on Solutions and Providers,” available online at
http://altaplana.com/TextAnalyticsPerspectives2009.pdf.
As a bridge to survey-derived reporting of user perceptions of the text and content
analytics market, solutions, and providers, we will look next at applications.
7
8. Text/Content Analytics 2011: User Perspectives
Applications and Markets
Business users naturally focus on business benefits, whether of analytics or of any other
technology or investment. Who are those users?
Text and content analytics solutions have a place a) in any business domain, b) for any
business function, and c) within any technology stack, that would benefit from automated
text/content handling, that is, wherever text/content volume, velocity, and variety, and
business urgency, are sufficient to justify costs. Consider a very telling quotation,
however: Philip Russom of the Data Warehousing Institute wrote in a 2007 report, “BI
Search and Text Analytics: New Additions to the BI Technology Stack,”2
“Organizations embracing text analytics all report having an epiphany moment
when they suddenly knew more than before.”
In the analytics world, we see now that it is not enough to know more. You need to
understand how to use knowledge gained, the processes and outcomes necessary to turn
insights into ROI. Text and content analytics elements – information sources, insights
sought, processes, and ROI measures – will vary by industry and application.
In this report section, by way of lead-in to survey findings – applications, information
sources, and ROI measures are the subject of survey questions 3, 4, and 5 – we look at
text/content analytics adaptation for applications in several industries and for a variety of
business functions.
Application modes
Applications are diverse but may be classified in several (overlapping) groups. Our
categorization is an update of 2009’s with social and online addition in particular:
Media, knowledgebase, and publishing systems – the author includes search
engines here – use text and content analytics to generate metadata and enrich
and index metadata and content in order to support content distribution and
retrieval. Semantic Web applications would fit in this category, as would
emerging information-access engines.
Content management systems – and, again, related search tools – use text
analytics to enhance the findability of content for business processes that include
compliance, e-discovery, and claims processing.
Line-of-business and supporting systems for functions such as compliance and
risk, customer experience management (CEM), customer support and service,
marketing and market research, human resources and recruiting… and newer
tasks that include social monitoring, measurement, and engagement.
Investigative and research systems for functions such as fraud, intelligence and
law enforcement, competitive intelligence, and science.
Where are these applications used?
Business Domains
Consider a sampling of industry domains where text and content analytics are frequently
applied:
In intelligence and counter-terrorism, and in law enforcement, there is broad
content variety – languages, format (text, audio, images, and video), sources
(news, field reports, communications intercepts, government records, social
2
http://www.teradata.com/assets/0/206/308/96d9065a-0240-44f1-b93c-17e08ae6eacc.pdf
8
9. Text/Content Analytics 2011: User Perspectives
postings) – and, at times, great urgency.
In life sciences, for instance for pharmaceutical drug discovery, source materials
have been more uniform (scientific literature, clinical reports) and there is no
need for real-time response, yet information volumes are huge and complex and
the potential payoff – years and millions of dollars shaved off lead-generation and
clinical trials processes – to justify very significant investments in text mining.
For financial services and insurance, effective credit, risk, fraud, and legal and
regulatory-compliance decision-making involves creation of predictive models via
analysis of large volumes of transactional records and often incorporates
information mined from text sources such as financial and news reports, e-mail
and corporate correspondence, insurance and warranty claims. Automated
methods are essential.
Market researchers rely on text analytics to hear and understand market voices.
Focus groups are (on their way) out: They are costly, slow, and often unreliable.
Surveys still have great value – beyond soliciting opinions, they can serve as an
engagement tool – but neither they nor focus groups help researchers hear
unprompted views, the attitudes that consumers express to their peers but not in
more formal research settings. Why text analytics? Social is hot, yet human
analysis, whether or surveys or of social postings, can be inconsistent and don’t
scale. Add in text analytics and you have next-generation market research.
As content delivery and consumption shift to digital, search and information-
dissemination tools that exploit metadata (publisher-produced, analytically
generated metadata, or socially tagged) are essential survival tools for media and
publishing organizations. Content analytics creates better targeted, richer
content and a much friendlier and more powerful experience for content
consumers.
Online and social have fomented an advertising revolution. Targeting is the word,
whether based on behaviors (modeled via tracking and clickstream analysis) or on
analytically computed matching. Matches may draw from user profiles, context
(geography, accessing application, device or machine being used) and inferred
intent (for instance from search terms), and the semantic-signatures of the
content where ads are to be delivered.
Text analytics provides essential capabilities in support of legal domain e-
discovery mandates. Organizations must “produce” materials relevant to
lawsuits, a task that would often be impossible without automated text
processing, given huge volumes of electronically stored information generated in
the course of business. Intellectual property is another legal-domain application.
The task is to identify names, terminology, properties, and functions salient to an
IP search that seeks to identify, for instance, prior art and possible patent
infringement.
Business Functions
Many business tasks are independent of industry. Every organization of any significant
size has in-house customer support, marketing, product development, and similar
functions (even while definitions of customer, marketing, and product do still, of course,
vary by industry.) Let’s examine the role text and content analytics play for the following:
Customer experience management (CEM) is a signal text/content analytics
success story. The aim is to transform customer relationship management (CRM),
which captures transactions and interactions, into a set of tools and practices that
cover the engagement span from customer acquisition to customer service and
9
10. Text/Content Analytics 2011: User Perspectives
support, first and foremost by listening and responding to the voice of the
customer across channels. In plain(er) English, CEM marries text- and speech-
sourced information – from e-mail, online forums, surveys, contact-center
conversations, and other touchpoints… and also from employee input – with
transactional and profile information. The hope is to improve customer
satisfaction and operate more efficiently and profitably. Simplistic, reductive
indicators such as the Net Promoter Score can only point at issues and challenges.
They can neither explain them nor suggest actions or remedies – insights that are
accessible (at enterprise scale) only via text and content analytics.
Marketers translate market-research and competitive-intelligence findings into
marketing campaigns and advertising and, in cooperation with product
developers, into higher quality, more satisfying products and services. It’s all
about listening.
Steve Rappaport, in his book Listen First!, says we should “Change the research
paradigm. Social media listening research should bring about an era of real-time
data that anticipates change and can be used to visualize and create a rewarding
business future,” as well as “rethink marketing, advertising, and media.” His
prescriptions about listening apply across channels and touchpoints, as they do
for CEM, with the difference here, for research-related functions, being that we
are looking at an aggregate rather than an individualized picture, seeking to hear
the voice of the market, again aided by text and content analytics. Our aim is to
deliver targeted, compelling advertising via more effective marketing and, of
course, superior products and services that better meet customer needs.
Competitive intelligence, in particular, involves mining customer voices, at both
individual and aggregate levels, and also business information, for instance about
sales, personnel, alliances, and market conditions that indicate opportunities and
threats. Ability to extract domain- and sector-focused information from online
and social sources and to integrate information from disparate sources in order to
derive coherent signals is essential, delivered by analytically rooted technologies.
Business intelligence (BI) was first defined, in the late 1950s, in terms of
extraction and reuse of knowledge drawn from textual sources.3 BI took off in a
different direction, however, starting in the late 1960s, centering on analysis of
numerical data captured in computerized corporate operational and transactional
systems. Back to the (1950s) Future: Number crunchers of all stripes recognize
the business value of information in text sources. They are seeking, with the help
of both major and niche BI and data warehousing vendors, to bring text-sources
information into enterprise BI initiatives. Call this integrated analytics, also
incorporating geospatial and machine-generated Big Data to bring businesses a
step closer to the sought-after (although mythical) 360o-view of the customer
(and the market and one’s own business).
Technology Domains
Last, for context, let’s briefly consider technology domains where text and content
analytics come into play, semantics and the Semantic Web, and then look at emerging text
analytics applications.
Semantic Computing
First, redefining, text/content analytics involves the acquisition, processing, analysis, and
3
Seth Grimes, “BI at 50 Turns Back to the Future,” InformationWeek, November 21, 2008:
http://www.informationweek.com/news/software/bi/211900005
10
11. Text/Content Analytics 2011: User Perspectives
presentation of enterprise, online, and social information derived from text and rich-
media sources. The technology is one route to semantics, to generating machine-usable
identification of information objects attached to databases, tables, fields, and rows; to
corpora, documents, and document content; and to media files, e-mail and text messages.
Text/content analytics provides a descriptive route to semantics, making sense of
information in-the-wild, as generated by humans (and machines) online, on social
platforms, and in everyday business and personal communications, whether written,
spoken, or captured in rich media. The alternative route to semantics is prescriptive,
generated or captured in the course of content generation, whether via database export
or a plug-in to an authoring application.
The Semantics market includes technologies for the creation, management, and use of
artifacts such as taxonomies, ontologies, thesauruses, gazetteers, semantic networks,
controlled vocabularies, and metadata. These artifacts may be generated manually by
subject-matter experts. They may be generated automatically by text analytics. And in
many situations, a hybrid system involving manual curation of automatically generated
artifacts may be in order.
Semantics applications include digital content management, publishing, research, and
librarianship across a broad set of industrial and government applications. The semantics
market includes semantic search, whether open-domain, vertical (applied to a particular
information domain), or horizontal (applied in a particular business function). It also
includes classification and information integration.
Classification, Search, and Integration
Semantic computing finds its primary application in classification, search, and integration.
Classification determines what a data item or object represents, including how it may be
used, in relation to other data items and objects in a data space. This is, admittedly, an
abstract and not particularly practical definition. Information integration and search are
where semantics finds its most compelling applications. Semantic search is, in essence,
“search made smarter, search that seeks to boost accuracy by taming ambiguity via an
understanding of context.”4 Several approaches fit under the semantic search umbrella.
They include related searches, search-results enrichment, concept searches, faceted
search, and more. The common thread is better matching searcher intent (inferred from
search context including past searches and the searcher’s profile) to searched-for
information content. Semantic search is behind many emerging search-based
applications, fueled by text and content analytics, for applications such as e-discovery,
faceted navigation for online commerce, and search-driven business intelligence. And it is
captured semantics, in the form of data identifiers and descriptions, that enables dynamic,
adaptive information integration, where join paths are discovered based on business and
application needs, not hard-wired as in until-recent computing generations.
The Semantic Web
The Semantic Web is, at its root, an information-integration and sharing application, a set
of standards and protocols designed to facilitate creation and use of “Web of data.”
Eventually, the Semantic Web market will include tools and services that execute
knowledge-reliant business transactions over distributed, semantically infused data
spaces. We are years from that market.
The bulk of Semantic Web focused expenditures are for government funded research
4
Seth Grimes, “Breakthrough Analysis: Two + Nine Types of Semantic Search,” InformationWeek, January
21, 2010: http://www.informationweek.com/news/software/bi/222400100
11
12. Text/Content Analytics 2011: User Perspectives
projects at universities and similar institutions. Outside research contexts, business
implementations do not extend significantly beyond a) the use of microformats and RDFa
(Resource Description Framework–attributes) to allow Web-published structured data to
be indexed by search engines to facilitate information access and b) the use of RDF triples
as a convenient format for structuring facts for storage in DBMSes supporting graph-
database schemas to facilitate integration and query of data from disparate sources.
At a certain point however, perhaps in 2-4 years, the Semantic Web will reach a tipping
point where its business value, and the revenues generated by technology and solutions
sales, licensing, and support, will explode.
Value Today
At this time, text/content analytics delivers business value that is greater by far than the
value delivered by related semantic and Semantic Web technologies. This is because the
vast majority of subject information – text, images, audio, and video (a.k.a. content) – is in
“unstructured” form, just a string of bytes (and terms, in the case of text) so far as
software systems – Web browsers and office productivity tools, content management
systems, search engines – are concerned.
To make content tractable for business ends, for operational or analytical purposes or in
order to monetize content as a product, one must first create structure. To maximize
content usability, for most social and for many enterprise sources, generated structure
will take into account semantic information extracted from source materials. That is,
structure shouldn’t be arbitrary, a matter of sticking information into a set of round
pigeonholes for square-peg content.
This process of the discovery, extraction, and use of semantic information in content is the
domain of text/content analytics solutions.
Solution Providers
The aggregate characteristics of the text and content analytics solution-provider spectrum
are little changed since 2009 although there has been significant turn-over in players. We
still have, as reported in 2009, “a significant cadre of young pure-play software vendors,
software giants that have built or acquired text technologies, robust open-source projects,
and a constant stream of start-ups, many of which focus on market niches or specialized
capabilities such as sentiment analysis.”
The big change is in delivery mode. The market now favors as-a-service analytics, whether
in the form of online applications, cloud provisioned, or provided via Web application
programming interfaces (APIs). This shift makes sense.
The most in-demand new information sources are online, social, and on-cloud.
Use of as-a-service, cloud, and via-API applications means low up-front
investment, faster time to use, and pay-as-you-go pricing without IT involvement.
Certain providers offer as-a-service access to both historical and current data at
attractive costs given the buy-once, sell-many-times economies they enjoy.
Modern applications are designed to draw data via APIs, facilitating application-
inclusion of plug-in text and content analytics capabilities.
There is every expectation that the solution-provider market will continue to evolve to
keep pace with user needs and broad-market business and technical trends.
12
13. Text/Content Analytics 2011: User Perspectives
Demand-Side Perspectives
Alta Plana designed a 2011 survey, “Text/Content Analytics demand-side perspectives:
users, prospects, and the market,” to collect raw material for an exploration of key text-
analytics market-shaping questions:
What do customers, prospects, and users think of the technology, solutions, and
vendors?
What works, and what needs work?
How can solution providers better serve the market?
Will your companies expand their use of text analytics in the coming year? Will
spending on text/content analytics grow, decrease, or remain the same?
It is clear that current and prospective text/content-analytics users wish to learn how
others are using the technology, and solution providers of course need demand-side data
to improve their products, services, and market positioning, to boost sales and better
satisfy customers. The Alta Plana study therefore has two goals:
To raise market awareness and educate current and prospective users.
To collect information of value to solution providers, both study sponsors and
non-sponsors.
Survey findings, as presented and analyzed in this study report, provide a form of measure
of the state of the market, a form of benchmark. They are designed to be of use to
everyone who is interested in the commercial text/content-analytics market.
Study Context
The author previously explored market questions in a number of papers and articles.
These included white papers created for the Text Analytics Summit in 2005, The
Developing Text Mining Market,”5 and 2007, “What's Next for Text.”6
A systematic look at the demand side provides a good complement to provider-side views
and to vendor- and analyst-published case studies, including the author’s own. This
understanding motivated the 2009 study, “Text Analytics 2009: User Perspectives on
Solutions and Providers,” available for free download.7
That research was preceded by Alta Plana’s 2008 study report, “Voice of the Customer:
Text Analytics for the Responsive Enterprise,”8 published by BeyeNETWORK.com, a first
systematic survey of demand-side perspectives, albeit focused on a particular set of
business problems. VoC analysis is frequently applied to enhance customer support and
satisfaction initiatives, in support of marketing, product and service quality, brand and
reputation management, and other enterprise feedback initiatives.
About the Survey
There were 224 responses to the 2011 survey, which ran from June 6 to July 9, 2011.
(Contrast with 116 responses to the 2009 survey, which ran from April 13 to May 10,
2009.)
5
http://altaplana.com/TheDevelopingTextMiningMarket.pdf
6
http://altaplana.com/WhatsNextForText.pdf
7
http://altaplana.com/TA2009
8
http://altaplana.com/BIN-VOCTextAnalyticsReport.pdf
13
14. Text/Content Analytics 2011: User Perspectives
Survey invitations
The author solicited responses via
E-mail to the TextAnalytics, SentimentAI, Corpora, Lotico, BioNLP, Information-
Knowledge-Content-Management, and ContentStrategy lists and the author’s
personal list.
Invitations published in electronic newsletters: InformationWeek, BeyeNETWORK,
CMSWire, KDnuggets, AnalyticBridge, and Text Analytics Summit.
Notices posted to LinkedIn forums and Facebook groups and on Twitter.
Messages sent by sponsors to their communities.
Survey introduction
The survey started with a definition and brief description as follow:
Text Analytics / Content Analytics is the use of computer software or
services to automate
• annotation and information extraction from text – entities, concepts,
topics, facts, and attitudes,
• analysis of annotated/extracted information,
• document processing – retrieval, categorization, and classification,
and
• derivation of business insight from textual sources.
This is a survey of demand-side perceptions of text technologies,
solutions, and providers. Please respond only if you are a user, prospect,
integrator, or consultant. There are 21 questions. The survey should take
you 5-10 minutes to complete.
For this survey, text mining, text data mining, content analytics, and text
analytics are all synonymous.
I'll be preparing a free report with my findings. Thanks for participating!
Seth Grimes (grimes@altaplana.com, +1 301-270-0795)
The introduction ended with the text:
Privacy statement: This survey records your IP address, which we will use
only in an effort to detect bogus responses. It is your choice whether to
provide your name, company, and contact information. That information
will not be shared with sponsors without your permission, and if shared
with sponsors, it will not be linked to your survey responses.
14
15. Text/Content Analytics 2011: User Perspectives
Survey response
There is little question that the survey results overweight current text-analytics users –
73% of respondents who answered Q1, “How long have you been using Text Analytics?”
(n=224) versus 78% of respondents who replied to Q7, “Are you currently using
text/content analytics?” (n=206) – among the broad set of potential business,
government, and academic users. (The difference in percentage is likely due to a higher
rate of survey abandonment among non-users. The figures contrast with 63% and 61% in
the 2009 survey.) So call this a Pac Man question, one whose response indicates very
significant survey selection bias:
Are you currently using text/content analytics?
Yes
No
21.8%
78.2% (n=206)
Market Size and the Larger BI Market
We can infer overweighting by comparing market-size figures. The author estimates an
$835 million 2010 global market for text/content-analytics software and vendor supplied
support and services. As the author described in the May 12, 2011 InformationWeek
article Text-Analytics Demand Approaches $1 Billion9,
“My $835 million market-size estimate covers software licenses, service
subscriptions, and vendor-provided technical support and professional
services. Despite strong growth, it remains a small fraction of Gartner's
$10.5 billion 2010 valuation of the broader BI, analytics, and performance-
management software market.”10
By contrast, the 2009 text-analytics market report cited the author’s figure of $350 million
for the global, 2008 text analytics market. (That figure did not account for search-based
applications, which were included in the 2010 market-size estimate.) The 2009 report
also cited a 2008 BI-market estimate from research firm IDC: “The business intelligence
tools software market grew 6.4% in 2008 to reach $7.5 billion.”11
9
http://www.informationweek.com/news/software/bi/229500096
10
http://www.gartner.com/it/page.jsp?id=1642714
11
http://www.idc.com/getdoc.jsp?containerId=217443
15
16. Text/Content Analytics 2011: User Perspectives
The Data Mining Community
Another contrasting data point is that 65% of respondents to a July 2011 KDnuggets poll12
report (n=121) using text analytics on projects in the preceding year. Results were tallied
nine days into the poll, before it was closed, so final numbers may differ from those
reported here.
The figure in a similar, March 2009 poll was 55% currently using text analytics/text mining.
KDnuggets: How much did you use text analytics / text mining in the
past 12 months?
Used on over 50% of my projects 21.5%
Used on 26-50% of my projects 9.9%
Used on 10-25% of projects 14.9%
Used on < 10% of my projects 19.0%
Did not use 34.7%
0% 5% 10% 15% 20% 25% 30% 35% 40%
KDnuggets reaches data miners, a technically sophisticated audience who are among the
most likely of any market segment to have embraced text analytics. The rate of text-
analytics adoption by data miners surely exceeds the rate adoption by any other user
sector.
As an aside, 49% of KDnuggets respondents stated that in comparison to the last 12
months, in the next 12 they would use text analytics more, whether on additional projects
or more intensively on a steady project workload. 43% stated their use would remain
about the same and only 8% anticipated less use.
12
http://www.kdnuggets.com/2011/07/poll-text-analytics-use.html
16
17. Text/Content Analytics 2011: User Perspectives
Demand-Side Study 2011: Response
The subsections that follow tabulate and chart survey responses, which are presented
without unnecessary elaboration.
Q1: Length of Experience
As in 2009, the 2011 survey opened with a basic question –
How long have you been using Text/Content Analytics?
35%
30%
25%
20%
15%
10%
5%
0%
not using, 6 months to one year to two years to
currently less than 6 four years
no definite less than less than less than
evaluating months or more
plans to use one year two years four years
2009 (n=107) 16% 22% 8% 5% 7% 18% 25%
2011 (n=224) 6% 21% 3% 5% 12% 20% 33%
We see that 2011 responses skew to longer experience than measured in 2009. Survey
results were not based on a scientifically designed or measured population sample
however, neither in 2011 nor in 2009, and given how out of proportion survey-measured
experience is to that of the broad business population – the addressable market for
text/content analytics likely extends far beyond the currently user base – the most
plausible conclusion one can draw from Q1 responses is that 2011 survey outreach failed
to bring in the proportion of new and prospective users reached in 2009. Nonetheless, Q1
responses will prove illuminating in analyses of subsequent survey questions, in studying
how attitudes vary by length of text/content analytics experience.
17
18. Text/Content Analytics 2011: User Perspectives
Q2: Application Areas
What are your primary applications where text comes into play?
39%
Brand/product/reputation management
40%
Voice of the Customer / Customer Experience 39%
Management 33%
39%
Search, information access, or Question Answering
36%
Research (not listed)
33%
33%
Competitive intelligence
37%
26%
Customer service/CRM
22%
Product/service design, quality assurance, or warranty 15%
claims 14%
15%
Life sciences or clinical medicine
18%
2011 (n=219)
15%
E-discovery 2009 (n=103)
15%
Online commerce including shopping, price intelligence, 11%
reviews
10%
Financial services/capital markets
15%
9%
Other
13%
8%
Insurance, risk management, or fraud
17%
8%
Content management or publishing
19%
7%
Military/national security/intelligence
6%
Law enforcement
7%
0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
The 219 respondents in 2011 chose a total of 748 primary applications, an average of 3.4
primary applications per respondent. While there is some category overlap, it is notable
that respondents are applying text analytics toward multiple business needs.
18
19. Text/Content Analytics 2011: User Perspectives
Q3: Information Sources
What textual information are you analyzing or do you plan to analyze?
62%
blogs and other social media 47%
41%
news articles 44%
35%
on-line forums 35%
35%
customer/market surveys 34%
30%
review sites or forums 21%
29%
e-mail and correspondence 36%
27%
scientific or technical literature 27%
23%
contact-center notes or transcripts 25%
22%
Web-site feedback 21%
21%
text messages/SMS/chat 8%
15% 2011 (n=215)
employee surveys 16%
14% 2009 (n=100)
field/intelligence reports
14%
speech or other audio
12%
crime, legal, or judicial reports or evidentiary materials 13%
10%
medical records 16%
9%
point-of-service notes or transcripts 12%
9%
patent/IP filings 11%
8%
photographs or other graphical images
7%
insurance claims or underwriting notes 15%
6%
video or animated images
5%
warranty claims/documentation 7%
0% 10% 20% 30% 40% 50% 60% 70%
19
20. Text/Content Analytics 2011: User Perspectives
The 215 respondents in 2011 chose a total of 962 textual-information sources, an average
of 4.5 sources per respondent. The big news is not news at all: Social sources are by far
the most popular and 4 of the top 5 categories are social/online (as opposed to in-
enterprise) sources. Despite social’s status, however, it is a source for barely more than 6
out of 10 respondents.
20
21. Text/Content Analytics 2011: User Perspectives
Q4: Return on Investment
Question 4 asked, “How do you measure ROI, Return on Investment? Have you achieved
positive ROI yet?” There were 164 respondents. Results are charted from highest to
lowest values of the sum of “currently measure” and “plan to measure”:
How do you measure ROI, Return on Investment?
Measure: Achieved Measure: Not Achieved Plan to Measure
higher satisfaction ratings 19% 18% 28%
increased sales to existing customers 13% 18% 29%
ability to create new information products 11% 13% 27%
improved new-customer acquisition 9% 15% 25%
higher customer retention/lower churn 10% 12% 23%
higher search ranking, Web traffic, or ad response 10% 12% 22%
reduction in required staff/higher staff productivity 9% 9% 23%
fewer issues reported and/or service complaints 9% 6% 23%
lower average cost of sales, new & existing customers 5% 7% 23%
faster processing of claims/requests/casework 10% 6% 19%
more accurate processing of claims/requests/casework 6% 7% 20%
0% 10% 20% 30% 40% 50% 60% 70%
Out of 164 respondents, 37.8% (62), report that they have achieved positive ROI according
to some measure. Those 62 respondents reported achieving ROI according to a total of
182 measures, that is, 2.94 ROI-achieved measures for each respondent who achieved
positive ROI.
Out of 164 respondents, 50 are measuring ROI but have not yet achieved positive ROI
according to any measure.
The 112 respondents who are measuring ROI (whether achieved or not) track a total of
385 measures among them, 3.44 measures per respondent.
The following are several of the Other responses given:
Better customer insight, market intelligence, and competitive intelligence.
Content findability.
Creation of scientific knowledge.
Higher employee engagement and better L&D outcomes.
Improvement in existing processes, turnover time.
21
22. Text/Content Analytics 2011: User Perspectives
Incremental sales lift.
Lowered cost of fraud, more accurate predictive analytics.
Number of action executives can take, estimated dollar savings from risk
correction/avoidance.
Patient outcomes.
Providing better data to scholars.
Reduction of Claim Cost.
Stronger understanding of subconscious emotional zones.
We don´t know how to measure it properly.
Q5: Mindshare
A word cloud, generated at Wordle.net, seemed a good way to present responses to the
query, “Please enter the names of companies that you know provide text/content
analytics functionality, separated by commas. List up to the first 8 that come to mind.”
There were 129 responses, many offering several companies. A bit of data cleansing was
done, to regularize names and remove inappropriate responses.
Contrast with the 2009 word cloud (deliberately rendered smaller than the 2011 cloud,
without an attempt to create sizing consistent between the two clouds) based on 48
response records, as follows:
Note that IBM acquired SPSS in mid-2009.
22
23. Text/Content Analytics 2011: User Perspectives
Q6: Spending
Question 6 asked about 2010 spending and 2011 expected spending.
How much did your organization spend in 2010, and how
much do you expect to spend in 2011, on text/content
analytics software/service solutions?
90%
80%
7%
70%
3%
6% 6%
60% 2% 7%
4%
7% 7%
50%
$1 million or above 9%
40%
$500,000 to under $1 million
30%
$200,000 to $499,999
30%
$100,000 to $199,999 20% 23%
$50,000 to $99,000
10%
under $50,000
15% 19%
use open source 0%
2010 spent (n=176) 2011 expected (n=165)
$1 million or above 6% 7%
$500,000 to under $1 million 2% 3%
$200,000 to $499,999 4% 6%
$100,000 to $199,999 7% 7%
$50,000 to $99,000 9% 7%
under $50,000 23% 30%
use open source 15% 19%
Questions asked of only current text/content-analytics users.
Questions 8 through 13 were posed exclusively to current text/content analytics users, to
the 81.2% of the 206 respondents to Q7: Are you currently using text/content analytics?
Q8: Satisfaction
Question 8 asked, “Please rate your overall experience – your satisfaction – with text
analytics.” It offered five categories, listed here with response counts:
Overall experience/satisfaction (n=117, of whom 3 No experience/No opinion).
Ability to solve business problems (n=114, 12 NE/NO).
Solution/technology ease of use (n=112, 5 NE/NO).
Solution/technology performance (n=114, 4 NE/NO).
23
24. Text/Content Analytics 2011: User Perspectives
Availability of professional services/support (n=112, 13 NE/NO).
Responses, which across categories are somewhat anomalous, are as shown:
Please rate your overall experience – your satisfaction – with
text/content analytics
100% 3% 3% 4% 4% 4%
4%
7%
90% 13%
17%
21%
24%
80%
31% Very disappointed
70% Disappointed
36%
Neutral
60% 36%
38% Satisfied
50% Completely satisfied
40% 58%
42%
30% 35%
31%
27%
20%
10% 17%
12% 12% 11%
9%
0%
Overall, 70% of current-users respondents who had an opinion reported themselves
Satisfied/Completely Satisfied even while the breakout-category counts totaled 59%, 36%,
47%, and 42% Satisfied/Completely Satisfied. We can surmise that the numbers who
voiced “No experience/No opinion” for the breakout categories tended to have a
favorable overall experience.
24
25. Text/Content Analytics 2011: User Perspectives
Experience/satisfaction sentiment polarity
Positive
Overall experience /
satisfaction Neutral
80%
Negative
60%
Availability of 40% Ability to solve
professional services / 20% business problems
support
0%
Solution / technology Solution / technology
performance ease of use
Q9: Overall Experience
Question 9 asked, “Please describe your overall experience – your satisfaction – with text
analytics.” The following are 49 from among the 63 responses, categorized, lightly edited
for spelling and grammar and with the names of three products masked:
Happy
It works.
Excellent.
Absolutely essential.
Very satisfied, most goals exceeded, big jump in effectiveness and customer
satisfaction.
Pretty happy given we are in a highly technical different to monitor/track niche.
Saving a lot of time for our journalists.
We have found having an application with the capabilities to clean and normalize
the text and quantitative data, process it to a form to analyze, and run text mining
and categorization on an ad hoc or production basis has greatly enhanced my
team's capabilities and productivity.
We found great value from using a Speech Analytics solution to retain customers
and improve the overall customer experience through root-cause analysis.
I have been working with text analytics for academic and scientific purposes and I
am quite satisfied with results achieved.
I work with nurse and social science researchers. They think that a chat with 20
people is research. I tend to analyze hundreds or thousands of free-text comments.
25
26. Text/Content Analytics 2011: User Perspectives
I use software to overcome the biases inherent in manual analysis.
It Takes Work
Very powerful tool but requires the organization's ability to take action on the
insights.
Valuable tool; my clients are content to underutilize it, so what is available more
than meets our needs.
Since we use open source, the ROI is basically how much time you put into the
solution and how many problems it solves. We have been successful so far.
Very Satisfied but extremely labor intensive
We provide this as a tool to our clients in our application for publishing press
releases. It works fine but could be better but that is up to us to implement it fully.
Once you spend man hours to set up the tool, it is extremely consistent on doing
what you tell it to do. I know improvements are coming but I'd like more AI from
text analytics tools than what is currently offered.
Do-It-Yourself is challenging but not impossible. Very cheap to operate.
Fairly satisfied – problem is I am sole researcher and data/text clean-up takes too
much time given other demands.
I've been a user and vendor of text analytics (in fact, in my early <...> days, we
helped coin the phrase “text analytics”). Vendors generally overpromise and have
difficulty delivering. Both vendors and customers underestimate the amount of
resources required to get it right. So, still hard to use for mainstream purposes.
Reservations and complications
Steep learning curve.
I am currently satisfied, but I believe we (as analysts) are just beginning to fully
unlock the full potential of text analytics.
On one hand, I'm amazed and thrilled that this stuff exists at all. But on the other
hand, I haven't seen anything that does just what I want it to do.
It's opened up opportunities to analyze unstructured data but not at the same level
as structured data.
Works well at highest level of analysis (e.g. sentiment) but not as well in auto-
coding for custom (i.e. project) studies.
Tools are good, but lack transparency, ability to explain how conclusions are
reached.
There is still a lot of work required to optimize this technology since it can currently
provide concepts but does not capture context and it’s a lot of slow painful work to
get the software to recognize context in which something is mentioned and
26
27. Text/Content Analytics 2011: User Perspectives
accuracy is still not a lot.
Unmet needs
Very promising technology but some difficulties to
- Implement smoothly text mining component into existing information system.
- Cope with various languages, formats, volumes, etc. of data.
- Measure and demonstrate tangible results in terms of improved information
extraction quality.
- Assess ROI (reducing processing time / saving resources for core tasks e.g.
analysis).
Powerful but overly difficult, impenetrable - technology vs. solutions.
An emerging and enabling technology in our business with broad applicability.
Satisfied in our applications with accuracy and precision but hitherto disappointed
with export capability to other applications.
Still a volatile market for applications beyond VOC/sentiment analysis. Vendors are
eager to please but sometimes overstate the capabilities. However, I still have
limited experience in solving real business problems with these tools (I am a
consultant).
I think this field is in its infancy. Lots of issues with data quality. Sentiment
analytics often flawed. Hard to scale or automate.
The handful of companies and solutions I came across do not seem to marry or
integrate structured and unstructured text easily... Algorithms are not quite
available as a function or way to improve accuracy.
I feel there is so much more work to be done both on the analysis side and also on
the business implementation side. While I work heavily in this area, I won't be
more satisfied until I see better end-to-end integration and until I see more
effective and systematic use of insights.
I do everything myself. The lack of good lexical resources and taxonomies is a real
problem that drives up the cost (in manpower) of providing a solution. And the
complexity of the infrastructure required vs. the apparent simplicity of the
problem (in managers' minds) makes it very difficult to adjust expectations.
We use <...> and we have to write our own routines to find the text and content
that we are interested in. There are plenty of functions that help us with our goals
but obviously there is still much that we need to do to higher recall and accuracy.
<...> is the only tool which is both open source and professionally useful. However
in spite of 20 years of development, it still has a very poor user interface as well as
API interface which hinder productivity and acceptance at a beginner's level.
Skepticism
Jury is still out.
It’s still evolving, accuracy of results something to watch for in iterations.
27
28. Text/Content Analytics 2011: User Perspectives
Still learning.
Very early days!
Promising but still very difficult to see quick results. Everything seems to take ages
and it’s been a painful learning curve.
Hard to trust the automated results when you've been used to achieving 100% with
manual human analysis.
Still too new.
Field as a whole is underperforming what is possible.
Though the concept is very appealing, it is still in its native stages, and a lot more
possibilities are left to be explored. IBM Watson is a good step ahead in that
direction.
Very poor, almost useless.
Looking ahead
On the whole, very satisfied with the range of solutions available and their ease of
use. Very much looking forward to watching the technology progress – it's
obviously not perfect yet.
Unlike structured data, getting value out of text analytics tools require
understanding of text elements – how to utilize occurrence of different parts of
speech, how to interpret different types of sentences like requests, commands,
opinionated sentences, etc. Domain knowledge and tunable and adaptable
systems are a must for success. Non-availability of trained personnel to provide
text mining services leads to dissatisfaction of users. Business end users do not like
to use the tools themselves because of the complexity. The process or strategy for
text mining needs to be established.
We're pretty happy with text analytics and see it as a transformational technology.
Most of text analytics' problems lie in how it is sold. It is both broad and deep and
has a myriad of tools best suited for very different use cases, but customers think
"text analytics is text analytics." Really, “text analytics” is a horrible term that
needs to be broken up into component parts.
Q10: Providers
Question 10 asked, “Who is your provider? Enter one or more, separated by commas,
most important provider first.” There were 77 response records, listing providers (sorted
and without counts):
Autonomy, Clarabridge, Colbenson, Content Analyst, Expert System,
GATE, IBM, in-house, Lexalytics, LingPipe, Megaputer, MotiveQuest,
open source , Open Text, R, Radian6, Rapid-I, Saplo, SAS, Smartlogic,
Sysomos, TEMIS, Teradata, TextKernel, Thomson Reuters (including
Calais, ClearForest), Verint, Zemanta
Note that the survey asked, “Please respond only if you are a user, prospect, integrator, or
28
29. Text/Content Analytics 2011: User Perspectives
consultant.”
Q11: Provider Selection
Question 11 asked, “How did you identify and choose your provider? (If more than one,
limit response to your most important provider.)”
Applicability, robust performance, open source.
Research.
Experience and luck.
Very satisfied reference customers with similar applications, most flexible
solutions, expertise of consultants, high quality of service, extreme agility, and
extremely rapid idea-to-deployment cycles.
They contacted us before launch of their first product.
Product evaluation in context of business application.
Based on business requirements in the framework of a European competitive
tender procedure.
Advised by a related Web development consultant.
We spent about a year evaluating and classifying vendors that in part or whole
would fill our needs as expressed in Q9. We decided on using an application with
integrated quantitative and qualitative analytic capabilities as the best
possibilities. We ended up doing POC's with SAS, SPSS and Megaputer, and ended
up choosing the later.
We evaluated multiple providers based on (1) tool flexibility – can we customize?
(2) accuracy (3) type of content it can tag (4) sentiment methodology (5) price.
Main criteria are cost, multi-language capability, and integration with SAS.
Competitive bids.
Large existing analytics relationship: Tool was an add on.
Conducted a thorough investigation of leading providers in the space.
Quality and reviews.
Personal recommendations.
Constructed a needs analysis ranking system. Our needs included ease of
integration, tools, ability to produce meaningful results at sub-document (short
document) level, ease of (or no) training.
Networking and academic partnerships.
29
30. Text/Content Analytics 2011: User Perspectives
Proof of Concept – evaluated about a dozen or so vendors – have not selected a TM
vendor as yet.
Recommended by a trusted source.
Based on recommendations <...> and our own search / lab testing, which brought
us to <...>.
Introduction from my manager.
What my client uses.
It was an obvious choice since there was no real alternative on the market (i.e.
language is limiting the products).
We compared a number of providers and decided to go for <...> that have a local
presence and are experts on the Swedish language.
Free, for research purposes.
Trials based on performance.
Price/performance tradeoff and applicability to targeted business problem.
I work for the company. Use other languages (Perl) as necessary.
Tested various services, rated results.
I choose the vendor or tools based upon my client application needs.
We do not have a primary provider ... we maintain a library of tools and use many
of them in the same project.
Trying all major ones.
Reputation, personal contacts.
Established open-source project.
Market research, pricing, case studies and product evaluations.
It was recommended to us.
Working in-house.
I worked for one of them and selected the other on their open source commitment.
Proof of Concept.
Support for Drupal.
Cost, applicability to needs.
30
31. Text/Content Analytics 2011: User Perspectives
I don't, that's up to my clients. But my advice to them is to begin with an
understanding of the goals, and work backward to identify the provider.
Company demo.
Recommendation from experts, and tried and tested different ones.
Management mandate, client.
RFP to replace an existing legacy system.
We already used <...> and had everything we needed to do Proof of Concept;
waiting for business reason to acquire <...>.
Advanced, scalable LSI technology.
Work for them.
Ability to mine audio, text, and customer surveys.
Q13: Promoter?
Question 13 is new with the 2011 survey; we did not ask it in 2009. It is a basic net-
promoter type question, without the “net” part: “How likely are you to recommend your
most important provider to others who are looking for a text/content analytics solution?”
Of 87 responses, 49% were positive, 23% were neutral, and 28% were negative.
How likely are you to recommend your most important
provider?
Extremely likely to
recommend against
15% Moderately likely to
recommend against
34%
Slightly likely to recommend
6%
against
Neither likely to recommend
7%
nor recommend against
Slightly likely to recommend
Moderately likely to
10% 23%
recommend
5%
Extremely likely to
recommend
Promoters outweigh detractors by a net of 21.
31